


’ 
OCT 21 192 ldf) 





oi 





pIcAlL ROOM 


me §=—s Lhe Journal of 
ducational Psychology 


Devoted Primarily to the Scientific Study of Problems of 
Learning and Teaching. 














BOARD OF EDITORS: 


WAY RUGG, Chairman. RUDOLF PINTNER, 
Lincoln Scat Teachers College. Teachers College, Columbia University. 
» Columbia University. 
Teachers College um niversity . 7 Eine. 


Carnegie Corporation, New York City. 
MES CARLETON BELL, 
“ae Training School for Teachers. LEWIS MADISON TERMAN, 


Leland Stanford University. 
FRANK NUGENT FREEMAN, vi 
Unieersity of Chicago. EDWARD LEE THORNDIKE, 
ARTHUR IRVING GATES, _ ee eee 
Teachers College, Columbia University. —_—_ 


VIAN ALLEN CHARLES HENMON, LAURA ZIRBES, Assistant Editor. 
Vi jeleeretty of Wisconsin. Lincoln School, of Teachers College. 








rae OCTOBER, 1924 $4.00 a year. 











CONTENTS 


Tue PsycuoLocy or THE GesTaLT. George Humphrey. . .. . 401 


Tue TRANSFER OF TRAINING IN RELATION TO INTELLIGENCE. 
RR SS) SOS RN eS i SES 413 


Tae Accuracy oF CERTAIN STANDARD TESTs FoR ScHOOL SEc- 
TIONING AND MarkinG. Percival M. Symonds. ...... 423 


Spurious CORRELATION AND RELATIONSHIP BETWEEN TESTs. 
Godfrey H. Thomson and Rudolf Pintner. . . . . . ce. ok 


On Scormne Muttirp.te Response Tests. Karl J. Holzinger. . 445 


A Srupy or THe Prepictive VaLvuEe or Certain KInps or 
Scores IN INTELLIGENCE Tests. William M. Brown .. . 448 


Tue StanparRD Error or CerTAIN EstrmatTep CoEFFICIENTS OF 
CorreLaTION. Eugene Shen .............. 462 


REQUEST FOR INFORMATION IN A STupy oF THE EFFECT oF ENVI- 
RONMENT ON INTELLIGENCE. Frank N. Freeman ..... 466 


New Pusiications In EpucaTIoNAL PsycHoLtocy AND RELATED 
FIELps oF EDUCATION 








Published Monthly Except June to August by 
WARWICK and YORK, Inc., 
York, Pa. Baltimore, Md. 























— ——— 
as Second Class matter November 15, 1921, at the Post Office at York, Pennsylvania, under the Act of Mareh 8, 1879. 


t 





we 

















The Journal of 
Educational Psychology 


ts published monthly, by Warwick & York, Ine., Baltimore, } 


and York, Pa. ‘Title page and 
December number. 


Psychological Principles 
Applied to Teaching: 
A Manual for Teachers. 


By 
Wiii1am Henry PYLE. 


This manual gives a fairly com- | 
plete list of those psychological | 
principles that have definite appli- ff 
cation to the specific problems that jf 
arise in the schoolroom. A suffi- | 


cient number of applications of 
each principle is given to make 


clear how the principle is to be | 
interpreted and to what class of | 
teaching problems it is applicable. | 


‘Price $1.50 + 8¢ postage. 














Nowadays 


Every progressive school teaches cur- 
rent events in some form. It is a 
vital part of the training of our boys 
and girls for good citizenship. 


For Twenty-three Years 


there has been but one standard text, 
CURRENT EVENTS is used and 
approved in the public and private 
schools of every important city and 
nearly every town and village in the 
United States, in every Territory 
and possession, and in twelve foreign 
countries. 


In clubs only 30 cents per pupil 
for the year ending in June. Rates 
and sample copies free to teachers 
upon application. 


CURRENT EVENTS 


Co_umsus, OHIO 


s South Wabash Ave. 460 Fourth Ave, 
CHICAGO New Yore 

















Warwick & York, INc. 
PUBLISHERS. 





BALTIMORE. }f 


HEADQUARTERS 








| European 


Plan 








es 


ou | 


lea 
th: 


su 





atl 
—— i) 
at 
| i iN 
thi atge 
; 
" - 
¥ 
‘ 




















To 

a 

, THE JOURNAL OF 

: P 

: EDUCATIONAL PSYCHOLOGY 

d 

r Volume XV October, 1924 Number 7 

| ; 

le 

y THE PSYCHOLOGY OF THE GESTALT 

n 

SomE EpucaTIonaL IMPLICATIONS 
. GEORGE HUMPHREY 
” Queens University 
“The tree,” says Mach, ‘with its hard rough gray trunk, its a 

S numberless branches swayed by the wind, its smooth soft shining f 
leaves, appears to us at first a single indivisible whole.”! This follows t : 

, the classical analysis that shows how such a thing as a tree can be i 

— analysed into elements which are common to more than one complex, eee 
such as the green colour, the ‘“‘touch”’ of the tree, and Mach adds the | 
“sensations of space.”’ According to Mach, “colours, sounds, spaces, a 
times” [the sensations] ‘“‘are the ultimate elements whose given con- pt 
nection it is our business to investigate.”’ Me 

The upholders of the Gestalt? theory maintain that this is a pe 


spurious analysis. The actual data of experience, they claim, are 
wholes such as Mach’s tree, and we never have an experience that is fa. 
not such a whole. Psychology, they claim, is the study of these whole- A 
data of consciousness and behaviour, their interaction, the conditions 
of their establishment and the laws of their change. Thus is estab- 
lished a new kind of psychological unit, one which further division 
alters, rather than one which cannot be further subdivided, just as the 





1“‘Beitrige zur Analyse der Empfindungen.” 5 vermehrte Aufl., 1906., 
s. 85. (I have used Williams’ translation, reprinted by Rand in “‘The Classical 
Psychologists,” p. 611. In the course of the present paper, the word Gestalt is 
often translated by “structure,” sometimes by ‘‘form.” Neither is entirely 
adequate. The first has psychological implications, the second metaphysical 
implications. Gestalt should not be confused with the older faltqualitat. 

? The chief exponents are Wertheimer, Kéhler and Koffifia. See the latter’s 
Perception, Psychology Bulletin, October, 1922, for an authoritative exposition jn 
English. 

401 , 

















402 The Journal of Educational Psychology 


chemical molecule can be subdivided, but not without changing its 
nature. The sensation, the isolated stimulus, are never experienced, 
and in place of them in our psychological thinking we must substitute 
the actual data of experience, wholes and whole processes. We 
never perceive the colour green, the ‘‘touch”’ of the tree, the separate 
‘sensations of space,” and although it may seem that an artificial 
analysis into sensations is scientifically useful, yet it is claimed that 
this analysis does subtle violence to the facts. The argument is the 
same whether our psychology leans towards the subjective or the 
objective side. 

Let us take an example. If a beam of light is projected at regular 
intervals through a slit onto a screen in a darkened room, and one 
watches the screen, there results in consciousness a series of white 
lines on a dark background. By varying the time interval between 
the flashes, the phenomenal series can be made slower or more rapid 
within certain limits. Suppose now a second beam of light is pro- 
jected closely following the first in time, and spatially a little away 
from it on the same screen. Objectively we then have two lines, one 
a little above the other, separated by a short interval of time, then a 
period of darkness, and then again two lines in the same arrangement, 
and so on. The addition of the second beam merely adds a second 
line, objectively considered. However, the actual effect on the 
organism is different. Under these circumstances one sees perhaps 
at first two waving lines, flickering in the most extraordinary way, 
and if now the time interval is carefully adjusted one sees no longer 
two lines but a single one moving backwards and forwards with an 
oscillatory motion. The stimulus is qualitatively different. The 
addition of the second line has profoundly affected the stimulus value 
of the first. We have not a simple addition of stimuli, or of phenomena 
in consciousness, but a psychological unity which cannot be built by 
the mere juxtaposition of its component lines, and cannot be analysed 
into those lines without doing violence to the facts. Thus the idea 
of association as the fundamental mental and behaviouristic mechan- 
ism is contradicted. Association implies the tacking on of a new 
element, and assumes that the old is fundamentally unaltered by the 
new. This experiment implies that the addition of a new element 
changes the effect of the elements already present, and itself has an 


1A simple case of the now famous “phi-phenomenon.”’ See Wertheimer, 
Zeitschrift f. Psychologie, Vol. LXI, 1912, 161ff. who has shown that the explanation 
cannot be found in “past experience,” addition of stimuli, etc., and gives the 
explanation of a physiological short circuit. 











ma CD CO CO 


Oo ODO oc 





The Psychology of the Gestalt 403 


effect different according to the situation already present. The whole 
process cannot be split up, Wertheimer has shown, into component 
parts. To attempt this would be to do violence to the facts, The 
process is a fundamental whole, a Gestalt, and this simple Gestalt is 
typical of our whole mental life. Mental life, according to the theory. 
is not to be considered as made up of a multitude of discrete, hypo- 
thetical sensations trailing clouds of feeling and emotion. We must 
divide it rather into its natural units, which are the Gestalten, and 
many phenomena which we have thought to involve the hypothetical 
sensation can, it is claimed, be seen to be internal processes within these 
Gestalten. Such an explanation is given of attention,’ the “ psycho- 
logical maid of all work,’’ and reasoning,’ as well as many other facts 
now noticed, it is claimed, for the first time. In the same way, mem- 
ory,* according to Wertheimer, ‘groups itself . . . by peculiarities 
affecting the whole’’ of the thing remembered ‘and by structural 
articulations.”’ We shall later return to the work on memory. 

Let us now take an example explicitly involving behaviour. In a 
monograph characterized by technical acumen and scientific insight 
of the highest order, Kohler has shown‘ that chimpanzees, hens and a 
three-year-old child, when confronted by a pair of grays, one lighter, 
the other darker, do not react to them absolutely, that is to say always 
to that particular colour to which they have been trained. Rather 
does the stimulus in such a case operate differentially, the organism 
reacting to the lighter or the darker of the two colours, whatever the 
absolute lightness or darknessmay be. Thustake three grays A, B,C in 
ascending order of darkness. An animal is trained to choose B, the 
darker, when confronted with the pair A, B. It is then confronted 
with the pair B, C, and Kéhler has shown that there is a very strongly 
marked tendency towards the choice not of B, but of C. That is to 
say, the stimuli are to be considered not as separate, discrete “addi- 


1See Rubin, “Visuell Wahregenommene Figuren,” Berlin and London, 1921. 
Rubin describes the phenomena of figure and ground without using the concept of 
attention and concludes that the term is “‘used mainly in such realms as are not 
exactly explored.” Cf. Wertheimer, below, for further criticism of the current 
concept of attention, who elsewhere gives interesting experimental results on the 
problem. 

*Seeing and solving the problem come through “concrete, characteristic, 
limited Gestalt-processes.” Cf. Wertheimer, Untersuchungen z. Lehre v.d. 
Gestalt. Psych. Forschung, Vol. I, 1921, p. 56. 

3 Ibid. 

‘ Nachweis einfacher Strukturfunktionen beim Schimpansen und Haushuhn. 
Abhandlungen der Preussischen Akademie der Wissenschaften, Vol. V, No. 2, 1918. 























404 The Journal of Educational Psychology 


tive” entities, but as forming for the organism an irreducible whole, 
a whole which is fundamentally the same in the two cases. Break up 
the whole situation into what might seem at first sight to be its com- 
ponent elements, and you do violence to the facts. And Kéhler even 
claims it is possible that reaction to a single stimulus requires a distinct 
type of analysis, consisting in the “freeing of a single absolute datum 
from a strong bond.”! There is evidence of such “structural func- 
tions’’ and corresponding behaviour all along the animal scale.” 

The same worker has laid down an interesting physiological 
theory in his book on the physical Gestalt.. In this astonishingly 
able volume? it is claimed that there are in physical nature structures 
which obey much the same kinds of laws as do the psychic structures; 
that there is evidence that a stimulated sense organ, particularly a 
stimulated retina, should be considered as such a physical system, and 
that this involves the immediate perception of form, as opposed to 
the “production” of form by higher processes. The crux of the 
whole position is here the electrical and other changes which must 
take place when a fresh center of excitation is set up on the retina. 
To quote from Kohler: ‘To different kinds and different intensities 
of stimulation different chemical reactions will, in general, necessarily 
correspond’”’ in somatic fields,’ and “the electro-motor force at the 
limits of two differently excited regions of a (somatic) field is a simple 
function of both conditions of excitation” (Erregungszustand), this 
force being the ev of the inhomogeneity of the retinal chemical 
processes. Thus“‘With every perception of form the optical field 
suffers a leap of potential along the boundaries of demarkation’’® 
(massgebenden Grenzen). It follows that the perception of form is 
part of the primary visual datum, and is not added in ‘‘the little 
known region of higher psychic processes.” If this reasoning is 
accepted, it is manifestly impossible to consider the ‘‘ complex psychic 
event, which corresponds to a given visual field, as a mosaic of single 
excitations with a true geometrical distribution, and in true geometrical 
proximity. Something more than this is going on in the somatic 
fields”’ which is the interaction of part with part and with the whole. 

1 Tbid. 

2 Ibid. 

3 “Die Physischen Gestalten in Ruhe und im Stationiren Zustand.” Braun- 
schweig, 1920. 


‘ As is claimed, inside the original movement, by Benussi. 
5 Op. cit., p. 4. 
¢ Jbid., p. 26. 





ee ee ee ee ee eo el oe Ole 


- — oa, nem nA 





ee ee nn ee 


Rn — 


er O28 CO me 


a Oo OO Ss Bie 


oo 


Oo 8 m O&O DW 


= CO pment 


\- 





The Psychology of the Gestalt 405 


Thus is provided an interesting theoretical justification for the 
deductions which Wertheimer made from his original experiments on 
apparent motion, motion called by him the phi-phenomenon. It is 
indeed Kohler’s acute theoretical approach that has synthesised the 
whole movement, raising it from the rank of a brilliant series of apergus 
to a unified body of doctrine resting on the legs of both experimental 
result and theoretical justification. Some indeed have claimed that 
the speculation of the “‘Physical Gestalt’ is premature, and these 
prefer to abide by the earlier experimental results. But theoretical 
basis the experimental results must have, and the views advanced by 
Kohler have at least the merits of explaining many of the facts with 
relatively few assumptions. 

One other conception should be discussed, namely, what is called 
the ‘‘pregnancy of the Gestalt.”” In his book Kohler has shown the 
existence, in a physical structure, of “tendency to come to a condi- 
tion of simple structurisation . . . with a minimum of structural 
energy.”’ He quotes as an example of the kind of thing he means, 
Gelb’s experiments.! Gelb reported that three successively exposed 
points of light showed the tendency to arrange themselves in percep- 
tion as a symmetrical group, even though they were objectively 
separated by different distances. In the: same Congress, Benussi,’ 
who had been experimenting with illusions of motion on the skin, 
spoke of ‘‘a tendency towards simple, more usual forms,’ an expres- 
sion to which Kéhler takes exception, as having certain theoretical 
implications. In the discussion other instances were given, Rupp 
quoting an experiment of Bourdon’s, showing that ‘‘ when three, four, 
five, or six points are arranged regularly round a center, the impression 
received is of a three-, four-, five-, or six-sided figure. On the other 
hand, if eight points are used, there is not necessarily a perception of 
an octagon but rather of a circle.” In the discussion Goldschmidt 
claimed that he had exposed faint polygonal, elliptical and cross- 
shaped visual stimuli, which had all been perceived as vague circular 
‘“‘mists of light,” and gave other interesting instances. These experi- 
ments show a tendency to a sort of consolidation, whereby the Gestalt 
becomes, in Wertheimer’s phrase, ‘‘as good as possible.”” This con- 
solidation, this tendency ‘towards simplicity, shows itself, it is claimed, 
not only in the formation of the phenomenal perception but also 
accounts for the change that takes place in a Gestalt during lapse of 


1 Gelb, Adhémar: Versuche auf dem Gebiete der Zeit-und Raumanschauung. 


Bericht u. d. VI Kongress f. exp. Psychologie, 36 foll. 
2 Tbid., p. 148. 














= « . 
fe 4 = 








* pee eee SS aa eee eee ee 


406 The Journal of Educational Psychology 


time. Thus Wolf! finds that when a sector between two concentric 
circles is shown, to an observer who is later requested to reproduce it, 
the double sector is reproduced with a larger radial angle. Many 
other figures were subjected to the same experiment, and all suffered 
alteration of one or two types, each of which works in the direction of 
a ‘“‘better’’ Gestalt. There are other instances of the same law scat- 
tered through the literature. For example, Gelb and Granit? found 
that the threshold was higher in the figure than in its ground. The 
effect of this will be, it is claimed, that the peculiar characteristics of 
the figure are to some extent offset, thus making the whole Gestalt 
more homogeneous, tending again to a “better”? Gestalt. This 
notion of ‘‘pregnancy” or ‘‘consolidation”’ is considered one of the 
most important, if one of the more obscure, parts of the theory. 
Such, in the briefest possible outline, is the psychological theory of 
the Gestalt. It is important to remember that, as Koffka’ points out, 
the theory has not been accepted by the majority even of German 
psychologists, and that, far from being the abstract theoretical thing 
which any summary presentation must appear, it is really deep rooted 
in experimental results. It originated in a purely experimental 
investigation, and, except in some of its latest developments which 
are perhaps not so fortunate ‘as the earlier work, it has continued as 
a working hypothesis for experiment. The ramifications of the idea 
are beginning to spread very widely, embracing as they now do the 
fields of social and esthetic psychology, and it begins to suffer, as 
usually happens in such cases, from the enthusiasm of its friends. 
Attacks have been made on it by many formidable opponents. 
It is not the object of this paper to examine their objections. It is 
probable that when the dust has died down many of them will be 
found to have been justified, and that the theory will have to be modi- 
fied accordingly. It is certain, too, that some of the results of the 
movement have been anticipated by previous theory. Such is the 
case with all scientific innovation. But the theory claims to have 
given new meaning and new experimental evidence for such previous 
theories, and to have brought a large number of previously known 
facts under one common explanation, as well as to have explained 
facts for which no explanation was hitherto forthcoming. It claims 
in addition to have discovered a large number of new facts. Although 


1 ““Psychologische Forschung.,” 1922, pp. 332-373. 


2 “Bedeutung von ‘Figur’ und ‘Grund’ f. d. Farbenschwelle. Zeitschr. f. 
Psych., 1923, pp. 83 foll. 


3 Perception. Psychology Bulletin, October, 1922. 





—™ © Ss 


ee a ee ee ee a 


a 





ee, te 


we 


\yw 


-_ We — We \e 





The Psychology of the Gestalt 407 


some of its more theoretical aspects may require revision, and further 
research may render untenable many of its positions, this product of a 
brilliant group of German scientists certainly deserves the respectful 
consideration of forward-looking psychologists. 

Any dogmatic application of the theory to educational practice 
would be premature. The theory is too new, and the complexities of 
the modern educational situation too great. It is perhaps better to 
state the educational implications as a series of problems raised by the 
theory of the Gestalt in the field of education. The educator will 
think of the problem of reading, where modern experiment has shown 
that the proper teaching unit is the word or even the whole sentence. 
In the activity of reading, the natural unit is clearly something larger 
than the single letter. It is interesting to students of education that 
something very like a Gestalt-Theorie of reading was evolved, in 1898 
in the Erdmann-Dodge monograph, and that the point was brought 
out still more clearly in the controversy that followed with Wundt.' 

The primary problem of educational psychology is the problem 
of learning. Stated objectively and in its baldest way this problem 
is as follows. Every organism in certain surroundings is capable of 
making certain responsive movements, or taking up certain responsive 
attitudes. If we examine the responses of the same organism on dif- 
ferent occasions to the same environment, we sometimes find a 
progressive change in the direction of what is called better adaption. 
We have used the phrase “‘to the same environment” with reference 
to the successive occasions calling out the learned response. This 
really begs a debatable psychological question—the question of the 
transfer of training. To the understanding of this difficult problem 
the concept of the Gestalt seems to offer a contribution. In an article 
which was really the historical starting point of the theory, von 
Ehrenfels postulated for the Gestaltqualitat that it could be transposed 
“like a melody.’? A thing is learned in one connection; the reaction 
or perception is possible in an entirely fresh set of circumstances, 
provided only that certain conditions are observed, just as it is 
possible to begin the same musical scale with any note. 

This is what seems to have happened at a very elementary stage 
in the case of Kéhler’s hens and chimpanzees. Kohler, it will be 


1Erdmann und’ Dodge: ‘Ueber das Lesen.” Halle, 1898. See p. 161 
foll. See also Wundt in Philosophische Studien, Vol. XV, Chap. 3; 1899, 
p. 287 foll. and ibid. 1900 Vol. XVI, p. 61ff., and Erdmann-Dodge, Zeitschrift f. 
Psychologie, 1899, Vol. XXII, p. 241ff. 

2 Ueber Gestaltqualititen. Viertelajhrschrift f. Wissensch. Philosophie, 1890. 














a ae 


See 





408 The Journal of Educational Psychology 


remembered, found that the decisive objective condition for reaction 
of a certain type was not the “absolute’’ colour—in the sense in which 
the physicists use the term—but the fact that such colours bore a 
certain relation to one another. In fact, the same reaction took place 
with different “absolute” stimuli. Have we not here a species of 
transfer which satisfies neither Thorndike’s theory of “identical ele- 
ments” nor Judd’s theory of “generalization,’’ nor any other of the 
classical transfer-hypotheses? It is interesting to note that the posi- 
tion seems to imply a reconsideration, at least, of a mechanistic assump- 
tion often tacitly made that with the same subjective conditions the 
same reaction is possible only when the absolute stimuli are identical. 
The position seems also to be supplementary to the Verworn definition 
of a stimulus as any change in the external vital conditions. Kéhler’s 
hens reacted to a continuous, as opposed to a discrete, situation, 
while Verworn seems to imply a continuous event. The above 
example of transfer of training is excessively simple, and perhaps on 
that account somewhat artificial, but the possibility of extension to 
more complex cases can be readily seen if one remembers that according 
to Kohler’s physiology we must think rather of cerebral potentials 
than of isolated nervous impulses. It should clearly be possible to 
set up the same difference of potential between different cerebral 
regions with very different absolute stimuli. At the least, we have a 
working hypothesis for further experimentation on the problem 
of transfer. 

Coming back to the central problem of learning, it has been said 
that the fundamental question before psychology today is the method 
of elimination of useless movements in learning. Here again the 
Gestalt-Psychologie offers a working hypothesis for experimentation, 
tentatively suggesting a principle that is of wider application in place 
of the ad hoc explanations usually given. It is claimed by Koffka! 
that we can apply the term Gestalt to series of actions directed towards 
a definite end, such as the movements of an animal acting instinctively 
or those of a chimpanzee striving to reach a banana. Now there seems 
good reason for applying the word Gestalt to what is generally known 
as a purposive action. The component parts of such an action satisfy 
all the Gestalt criteria of von Ehrenfels and Kohler. These criteria 
are three: First, that the structure is more than its parts—the example 
given in the original connection is that if the notes of a melody are 


1 Koffka: ‘‘Grundlagen der physischen Entwicklung.” Osterwieck am Harz, 
9121, p. 74 and passim. 











The Psychology of the Gestalt 409 


each given to a different man, the resulting effect is not the melody, to 
obtain which the parts must be brought into ‘‘functional relationship’”’ 
with one another; second, the structure can be transposed; third, 
which is added by Kohler, is that separation brings alteration of 
the parts.' 

Now it is clear that the purposive action satisfies all these tests. 
If 20 dogs each made a motion corresponding to one of the motions a 
dog goes through while he is chasing a cat, the result is not the same 
thing as if all these movements took place in the same animal. The 
movements, when they are out of ‘‘functional relation,’’ are different 
from the same movements in such a relation. In the same way, the 
same general act can take place under different conditions. A dog 
can chase a cat through a swamp, down a street, up a stairway. The 
act of chasing a cat is “‘transposible.’”’ Further, if one of the compo- 
nent actions of the dog is taken away, or a fresh one put in, the other 
acts have to be altered. If the dog meets an obstacle during the 
chase, he must go round it or otherwise surmount it, which will mean 
that he will have to change the character of the rest of his hunt. 

A word of caution is here necessary. The application of the 
Gestalt idea to dynamic occurrences rests on a much less secure founda- 
tion than the rest of the work. Koffka? states that the same principles 
apply though “with far greater difficulties,” and it is but fair to say 
that these difficulties may ultimately prove insurmountable. On 
the other hand, it is also fair to state that the notion provides, again, a 
reasonable working basis for experimentation, which is a service of a 
high order. 

If then one will allow the term “‘Gestalt’”’ to a “purposive”’ series 
of actions, the problem is at once raised as to whether the Kéhlerian 
theory of pregnancy does not apply, according to which the elimina- 
tion of useless movements would be conceived as a process of Gestalt 
consolidation. On this view, it would be expected that, after an 
organism has once responded in a certain way to a certain outside 
situation, there will be a subtle change in the organism which will 
make itself felt when a similar situation is again experienced, cor- 
responding to the change which took place in “memory” in Wolf’s 
experiments described. It is possible, too, that those stimuli which 
harmonized with the now changed Gestalt would be “‘satisfiers,”’ 
the inharmonious ones ‘‘annoyers,” and if this is the case we have, 


1 “Physischen Gestalten,’’ p. 42. 
2 Tbid., p. 75. 











oy a a 





410 The Journal of Educational Psychology 


again, a working hypothesis for experimentation upon that standing 
puzzle, the fact that a satisfying act can affect actions that come con- 
siderably before it in the series. Each member of the Gestalt series is, 
of course, in close functional relation with the other members and with 
the whole. To admit this intermediate consolidation between periods 
of stimulation is to revive the theory of “learning to skate in the 
summer and to swim in the winter,” a theory against which there is 
some experimental evidence. One should here notice also that the 
view implies, further, a criticism of the term “trial and error,’”’ which 
carries the implication of a number of separate attempts to solve the 
problem. If one adopts the Gestalt idea, one should think rather of 
the different phases of a single attack on the problem, each phase 
being influenced by the fact that it is, with the other phases, part of 
the unitary attempt at solution. This with the “pregnancy” idea 
provides at least a hypothesis of how the animal actually, for example, 
learns to get out of the maze, without resorting to mysticism on the 
one hand or begging the question on the other. It should be remem- 
bered that the notion of the “pregnancy” of the Gestalt is admittedly 
obscure, and ‘‘cannot yet be stated unequivocally,” especially in 
connection with series of actions. But it seems at least to give a 
fruitful hypothesis of learning for educators and others to work upon. 
One may add also that the view is in opposition to the notion of 
chance movements in this type of learning. It stresses the ‘‘sensory” 
as well as the “‘motor”’ aspect of the process, and conceives the sensory 
data as well as the motor elements to be governed by definite structural 
conditions. The relation between the sensory and motor aspects 
shares the obscurity of other theories of the subject. 

Two other important educational considerations arise from the 
foregoing discussion. One concerns the instinct. In conformity with 
the root idea of the Gestalt, we find the instinct considered by members 
of the school as a ‘“‘configuration . . . of response for which the 
organism is predisposed.”! This seems to be begging the question 
at least in its modern form. Are there groups of responses which do 


not depend on experience? If so, what are they? It would appear | 


that the discussion of the nature of such groups is premature at least 
until their existence has been more definitely demonstrated by a 
conclusive answer to modern destructive criticism. The other point 
concerns the analysis of the learning process. In the stimulating 


1 Ogden, R. M.: Need of Some New Conceptions in Educational Theory and 
Practice. School and Society, Vol. XVIII (Sept. 22, 1923), p. 456. 





ss Fast ® 4.°3 


oe 


7 wm rw 3s hlUmDlUlCtmS COC OlUSlCUCUOlUCOClUC DTC KCL CO 


., > © — 


as a a —_. mr ~~ —_ a, Ae 





aon =m ~~ 


> 0 S O&O 





The Psychology of the Gestalt 411 


paper above referred to, Professor Ogden has mentioned with some 
disfavour the work of Pavlov in attempting to isolate and study one 
external factor as a sample of the multitudinous factors involved in 
the complexities of ordinary learning. Professor Ogden, in effect, 
denies that the isolation is real. It is the writer’s belief that the 
Gestalt theory would admit that Pavlov, to all intents and purposes, 
actually succeeded in isolating a single factor, but that this astonishing 
technique has only given us a limiting case. Kéhler’s point is that to 
take such a single factor of “‘moment” out of its context is to change it. 
That is part of his definition of the Gestalt. Just, as perceptually, 
we cannot, according to Wertheimer, isolate a single element of con- 
sciousness without doing violence to the facts, so in behaviour we 
cannot analyse out a single stimulus. To do so is in each case to 
adduce a limiting case, from which no deduction can necessarily be 
made to the more complex instance, just as one cannot deduce the 
general properties of the conic section from those of the circle, though 
the converse is readily possible. As a controlled experimental 
approach to the problem of learning Pavlov’s magnificent work stands 
and will stand as a monument. It has not, however, been able to 
predict the behaviour of an animal confronted with a complex of 
stimuli, and it may be that the idea of the Gestalt will form the 
necessary supplement to the work of the great Russian. Much the 
same criticism might be brought against the isolation of simple 
problems in tests. It is recognized that such a test as the Army 
Alpha does not discriminate in the higher reaches of intelligence. 
The Gestalt theory would explain this as due in part to the erroneous 
assumption that this test, in common with most others, presupposes 
that a simple operation of the mind is not altered by being divorced 
from its context as part of a complexity, and also that a complex 
action can be expressed as the sum of a number of simple ones. The 
same kind of criticism should, as Professor Ogden points out, be 
introduced into our conception of the curriculum. K6hler’s notion of 
cerebral “‘potentials’’ would involve a different hypothesis of the 
effect of any single branch, with a more organic relation between 
studies than previous theory has accounted for. Practice has here 
probably outstripped theory. 

The status of the Gestalt in educational psychology is that of a 
working hypothesis which raises a number of important problems, 
a few of which have been touched upon in this article. It would be a 
great mistake to regard it, at the present time, as anything more than 














——-— ‘ 


- 2 
rats 








412 The Journal of Educational Psychology 


this. But it has already inspired a number of interesting pieces of 
experimental work in pure psychology, and if it can perform the same 
function for educational psychology that will be a valuable service. 
Speaking summarily, one may say the theory is a plea for insistence 
on wholes and whole processes. This insistence, one may add as a 
final remark, is not confined in more recent thinking to psychology. 
In his essay on atomic theories, Professor Bragg,' speaking of matter, 
electricity and energy, states that ‘‘ Nature herself has already chosen 
units for them. The natural unit does not, of course, bear any exact 
connection with our own... Nature has chosen to speak in a 
certain language; we must get to know that language.’”’ These sen- 
tences might be taken as the text of the Gestalt movement. The 
Gestalt theory sets up exactly such a new psychological unit, one 
which is not, indeed, indivisible but which satisfies the condition for a 
real natural unit that it cannot be further subdivided without suffering 
an alteration of its essential nature. The final acceptance or rejection 
of the theory, in whole or in part, waits upon future research. What- 
ever the verdict, it is at least certain that the theory will have per- 
formed a valuable service to education if it has only succeeded in 
focusing our attention once more, in these days of ‘‘additive”’ tests 
and ‘‘unit’’ courses, on the primary interdependence of mental 
processes. 


1 Edited by Marvin: ‘‘Recent Developments in European Thought.” Oxford 
University Press, 1920, p. 216. 





coin, ar ee aa fe 





J en ~~? Ew 


we eal 





THE TRANSFER OF TRAINING IN RELATION 
TO INTELLIGENCE 


FOWLER D. BROOKS 
Johns Hopkins University 


In the literature on the subject little reference is made to the 
relation between transfer of training and intelligence, and no study of 
it seems to have made use of the intelligence-testing technique devel- 
oped in the last few years. Judd (1908), primarily interested in the 
role of generalized experience, discussed the Relation of Special Train- 
ing to General Intelligence, but no intelligence tests were used, the 
first revision of the so-called 1905 Binet Scale having just been issued. 
Rugg (1916) analyzed and summarized the studies of transfer up to 
that time. In his own study, he found transfer of training in descrip- 
tive geometry positively related to scholarship in ‘‘disciplinary or 
training courses, such as the various college courses in mathematics,” 
but neutrally related to scholarship in English and modern languages. 

To determine the amount of transfer from training in mental 
multiplication and substitution and its relation to intelligence, the 
following investigation was made in an elementary school in Baltimore 
in May and June, 1923. One hundred sixty pupils of Grades Vla, 
Vila, VIIIb, and VIIIa were selected to form the experimental and 
control groups. 

Test Series I (T.S:I.) in Grades VIIa to VIIIa consisted of the 
Courtis Arithmetic, Form B (division being given as mental division 
with a 10-minute time allowance), and the following tests devised by 
the writer: Two French vocabulary tests, two letter-digit substitution, 
one number cancellation, one a-t cancellation, a modified form of 
Binet’s simultaneous adding, immediate memory for words and digits 
(three- to eight-term, and three- to ten-term lists), drawing lines equal 
to 21 given lines of various lengths from 4 to 4 in., arranged in mixed 
order, and bisecting the given lines. Test Series I for the VIa classes 
included all of the tests for Grades VII and VIII except the substitu- 
tion and a-t cancellation tests. Courtis Division was not given as 
mental division, but according to Courtis’ directions. 


1 In the February number of this Journal (pp. 90, 97) Thorndike uses .09 as a 
conservative estimate of the true correlation between initial ability (in intelligence 
tests) and gains for pupils taking the same courses. Gain probably includes (a) 
@ year’s general mental growth, (b) practice effect in taking the tests, and (c) 
special (or general) training from the school studies. 

413 








» = 
- 
va 











— 


~~ 


414 The Journal of Educational Psychology 


Test Series II (T.S.II.) was the same as T.S.I., except that other 
forms of the French vocabulary, substitution, simultaneous adding, 
and immediate memory tests were used. 

The time of the Courtis Tests (except mental division in the 
Grades VII and VIII) was shorter in the test series than it is regularly. 
Experimental and control groups took the same test series and at the 
same time. T.S.I. was given during the three forenoons immediately 
preceding the training series, and T.S.II. during the three immediately 
following the training. Many of the VIa experimental pupils tired 
of their training series by the time it was half finished. Practically 
all of the Grades VII and VIII experimental pupils kept up their 
interest throughout the experiment. 

The training series in Grades VII and VIII consisted of the mental 
multiplication of two-place by two-place numbers 20 minutes per day 
on 14 consecutive school days. When the answer was obtained, it 
was written down beginning with the left-hand figure. No other use of 
pencil was allowed. Each day’s practice was divided into two 
10-minute periods, separated by a two- or three-minute rest period. 
Practice was given the same time each day. Before beginning each 
day’s practice, two or three minutes were spent talking over improve- 
ment and noting each individual’s score for the previous day. Prob- 
lems were printed on 8}4 by 11 in. paper and were before the pupils as 
they solved them in a specified order. 

The training series in the VIa class consisted of 15 minutes daily 
practice on 15 successive school days substituting letters for digits in 
15 forms of a substitution test devised by the writer. The work each 
day was done with only a very general knowledge of the results of the 
previous day’s practice. 

Intelligence was determined by giving the Otis Advanced Examina- 
tion, Form A, and Haggerty Delta 2, to all pupils in experimental 
and control groups. 

In Grade VIa, one class was chosen as the experimental group and 
two others as the control, those pupils being selected for this study 
who would make two equivalent groups. 

In Grades VII and VIII two classes were used. One was composed 
of VIIa and VIIIb pupils, and the other of VIIIa pupils. Each class 
was divided by the principal into two approximately equivalent groups. 
The two boys having the highest intelligence test scores in one class 
were selected. One was placed in the experimental and the other-in 
the control group for that class. The two having the next highest 








ee ee, en. ee ol 





Transfer of Training 415 


scores were next selected and assigned, etc.; and similarly the girls, 
care being taken to keep the two groups alike in intelligence test 
scores. Whenever any two pupils having the same intelligence test 
scores were thought to differ in willingness to cooperate throughout the 
training series, the one most likely to do so was placed in the experi- 
mental group, so that the results of the training might not be obscured 
by having some pupils in this group so tired of the training series that 
they would not put forth their best effort in the second test series. 
This seems to have been successful, for practically every pupil in 
Grades VII and VIII experimental and control groups worked at a 
high level of effort in T.S.I. and T.S.II. 

The equivalence of the groups may be estimated from Table I, 
showing the mean mental ages, the means and SD’s of chronological 
ages and of intelligence test scores. The experimental and control 
groups were very nearly equivalent. In the two upper grades the 
experimentals were younger than their controls, whereas the reverse 
was true in Grade VI. The chronological ages of the experimental 
groups were more widely dispersed than those of their controls. The 
most important difference is that the VIIa-VIIIb experimentals were 
10 months younger than their controls. 


TaBLeE I.—TuHeE EQUIVALENCE OF EXPERIMENTAL AND ContTROL GROUPS 
































| ‘7 
Number of | Intelligence test scores 
——— Mean SD — 
chrono- | chrono- | Mean 
Grade | logical | logical MA, Mean 
Means Means 8D of 
| : | age, age, months 
| Boys | Girls | months | moaths of of Haggerty 
| Haggerty Otis and Otis 
| scores 
| 
VIlIlIa | 
Experimental....... | 10 | 12 | 175 | 9.6 183 114.2 119.1 17.1 
Control.......... ' 10 12 178 | 8.8 181 112.2 117.2 17.4 
VIila-VIIIb | | 
Experimental... . 11 11 161 | 11.8 165 97.5 102.2 15.5 
Control.......... 11 11 171 | 9.3 164 97.4 101.9 17.9 
Via § | 
Experimental... 18 18 154 14.2 151 85.4 89.2 17.0 
Sb ogeccsece 18 18 151 9.9 155 91.2 81.0 17.0 
| 




















The mean score of each group in each test of T.S.I. and T.S.IT. 
was found. The improvement (positive or negative) was expressed 
as percentage of initial score of that group in that test (Rugg, 1916). 
One measure of transfer is the difference between the percentage 








f 








= “ > 


—_— ee 
Re te Ante HE 


Se 








416 The Journal of Educational Psychology 


TaBLeE II.—In1t1au Scores, Gross Garns, GAIN Per CENT, AND THE PERCENTAGE 
Gains oF EXPERIMENTAL OVER ContTrot Groups (REsIDUAL GAINS) 











Villa 
Courtis arithmetic 
Addition attempts.............. 


ee 
| 
ee 


eee e weer eee ween eeeeee 


Lo ) eee ee as 


es ch a wne ne mene 
French vocabulary............... 
a 
Immediate memory 
aint x ES GE Sree 
DER cotsdheavevecdws 
Words in span................. 
Errors drawing lines.............. 
Error bisecting lines.............. 


a 
§ SSN wSavomaerHmHe 


oe 
me wre fw 


. One 


~ 
wo 
“>t 





VIila-VIIId 
Courtis arithmetic 
Addition attempts.............. 


ee 
ee eT 
eee eeee eee eee ee eee 


eeerreee ee eee eee eee eeee 


SNGL PEDEE ok ccccetsveece nt 
Cancellation a-t.................. 
inte cccctacenktonss 
French vocabulary............... 
as ood pcccccccc cess 
Immediate memory 
oe iy Di ectccenaeascews 
Ec vcitacess cebeeads 
oo re 
Error drawing lines.. ............ 
Error bisecting aes Eee 
a 


Courtis arithmetic 


ee 


SO errr 
Division attempts.............. 
hia aeAsae.¢ 0900.80 22.69% 
Simultaneous adding.............. 
Cancellation digits................ 











Experimental 
T.S.I rong 
8.1. per 
Mean cent 
(zx) 
8.05 . — 1. 
5.73 ‘ 5. 
10.77 ‘ — 2. 
8.91 J - . 
6.73 ‘ 1. 
5.14 d 0. 
5.23 a; 31. 
3.68 1. 45. 
13.14 5. 40. 
19.00 2. 12. 
83.00 3. 15. 
3.84 |— 2. —60. 
116.41 65. 56. 
23.41 3. —15. 
22.09 ‘ 2. 
106.05 4. 4. 
27.27 1. 4. 
5.91 ‘ 16. 
4.77 J 4. 
11.21 2. 17. 
. 163 ‘ — 5. 
7.76 52. 
5.52 56. 
8.38 13. 
6.71 12. 
5.24 27. 
4.33 24. 
4.62 60. 
3.24 76. 
11.62 68. 
19.05 14. 
76.23 26. 
3.64 —63. 
115.45 59. 
25.50 —34. 
19.71 3. 
91.82 20. 
26.45 10. 
7.32 — 8. 
5.00 2. 
13.36 4. 
.176 18. 
5.55 21. 
3.58 18. 
3.89 26. 
2.75 18. 
5.42 18. 
3.81 9. 
4.22 32. 
3.39 20. 
15.00 8. 
139.64 36. 
5.25 — 5. 
15.69 J 
26.91 2. 
6.03 — 0. 
5.03 1. 
15.08 —12.00 
. 216 — 3.70 


CHP WOUAOWET 


wSowe R-REGNSSSEHxowES 
SSCSye SPCernessgsSrsssass 

















1 Number of digits right minus 9 number digits wrong. 








ga 
rei 


> OQ 


TT 


won 


Fe” ne |e OE el Ee | 





PN OK Or Awoanooewtrsi 


I N@OwNN 


Cr ANANN&S DOOOW Ae 


eNmoOownN 


NwOUVUSe=DOno- 





Transfer of Training 417 


gained by each experimental group over that gained by the cor- 
responding control group in each test. This difference is designated 
as ‘“‘residual gain.”” The essential data are shown in Table II, from 
which it appears that the greatest amount of transfer took place in 
Grades VII and VIII to mental division (a function closely resembling 
mental multiplication in that it involves mental multiplication of two- 
place by one-place numbers), and to accuracy in cancellation and 
judging length of lines. In Grade VI the most noticeable residual gain 
is in accuracy of judging length of lines and in speed in some of the 
Courtis Arithmetic Tests. | 

Combining certain data of Table II we have Table III, showing the 
residual gains of the experimental groups in certain kinds of functions. 


TaBLeE III.—ReEsipvat Gains or TaBiLe II Comsrnep AccorDING To KIND oF 
MENTAL FUNCTION 





























: Vila- Vila- 
rw ove Tr 
Courtis arithmetic 
Attempts and rights in addition, sub- 
traction, and multiplication (6 | 
INS, Sctalnd CECE SAS ow ee SOAs 4.36), 9.53 6.95 13.97! 
Speed (attempts—3 tests)......... —1.68 11.31 4.82 19.98! 
Accuracy (rights—3 tests)........ | 10.40 7.76 9.08 7.96! 
Mental division.................. | 14.70 38 . 66 26.68 
Simultaneous adding................ | 13.44] —1.27 6.09 4.75 
Cancellation-speed.................. | 3.97 7.88 5.93 | 12.00 
Cancellation errors..................| —22.98 | —33.70 | —28.34 5.12 
EE ee eee Pre | — .13 .60 24 .46 
Judging length of lines.............. —18.68 | — 5.36 | —12.02 | —22.58 





1 Attempts and rights in addition, subtraction, multiplication and division. (8 
tests) 
Attempts and rights in addition, subtraction, multiplication and division. (4 
tests) 
Rights in addition, subtraction, multiplication and division. (4 tests) 


Mental multiplication yielded a residual gain of less than 5 per 
cent ‘n speed on the first three arithmetic tests, and one of 9 per cent 
in accuracy, while substitution in Grade VI was followed by a residual 
gain of 20 per cent in speed on the four arithmetic tests, and one of 
8 per cent in accuracy. Probably this might be expected since accu- 





SS aw FF 





2 eli lie 
eT 





‘jl 


aS ee RST Me He 


mt ———— a a ae a eC et ae ee ae en Cai a A ee 








a 


418 | The Journal of Educational Psychology 


racy was greatly stressed every day in the mental multiplication prac- 
tice, and speed in the substitution practice. In speed of cancellation 
Grade VII and VIII experimentals gained about 6 per cent while 
Grade VI experimentals gained 12 per cent.. But in accuracy of 
cancellation (measured by number of errors) the residual gain of the 
former group was —28 per cent, and of the latter 5 per cent 7.e., the 
Grade VII and VIII experimentals decreased the number of errors 
to a greater extent than their controls, thereby increasing their accu- 
racy, while the reverse is true of Grade VI group. Apparently special 
emphasis upon speed in the one case and upon accuracy in the other 
carried over to the test series. 

In memory (French vocabulary and immediate memory) in 
Grade VI there was no transfer although it might be expected from 
the training in substitution. This failure may be due to the lack of 
interest of many pupils in Grade VI during the last half of the training 
in substitution, a lack which may have been present toward the mem- 
ory tests of the final test series. 

In judging the length of lines, on the other hand, the residual 
gains from mental multiplication do not show as much increase in 
accuracy as those from the substitution. Probably the most impor- 
tant fact about the results of these two tests is that nearly all groups 
were less accurate in the second test series, the experimentals increasing 
their inaccuracy less than their controls. 


III 


The relation between intelligence and transfer can be determined 
in several ways. ‘Two simple ways are (1) to divide each experimental 
group into quartiles according to intelligence, and compare the 
amount of transfer of each quartile, and (2) to find the coefficients of 
correlation between transfer and intelligence. 

The amount of transfer for each quartile can be estimated by 
finding the difference between its mean improvement and that of the 
corresponding quartile of the control group, or that of the entire 
corresponding control group. The amount of transfer may be esti- 
mated for each individual of an experimental group by finding the 
difference between his improvement in each test and (a) the mean 
improvement of the proper control group in each test, or (b) the 
improvement made in each test by the corresponding pupil of the 
control group. Or, various combinations of these methods may 
be used. 








> OD mw £ 


ma Oo © 


Oo oO SS 


— 
1 


Ss 2 OD we OD 





Transfer of Training 419 


In order to measure exactly the amount of an individual’s transfer 
we must know how much of the improvement from T.S.I. to T.S.II. 
is due to practice in taking the tests. This we can never know for 
any individual of the experimental group, so we infer it from the gains 
of the control group. But to assume that the amount of an indi- 
vidual’s transfer is even approximately determined by finding how 
much more improvement he makes from T.S.I. to T.S.II. than the 
corresponding pupil of the control group, is a highly questionable 
procedure because no one believes that the practice effect for a given 
chronological age and IQ could be determined from one or two cases. 

An adequate measure of practice effect from taking the training 
series would be the mean practice effect and its SD for several pupils 
of each intellectual level at each chronological age; e.g., the mean 
improvements from practice in taking Courtis addition (and their 
SD’s) for say 100 (or more) 14-year-old boys whose 1Q’s range from 
105.00 to 109.99, for 100 14-year-old boys whose IQ’s range from 110.00 
to 114.99, etc. If we had this information, we could then subtract 
from each experimental’s gain in a test the mean gain (practice effect) 
of a group having the same chronological age and IQ, express each 
deviation as a function of the SD of the practice effect for that chrono- 
logical age and IQ, and treat these sigma functions as raw transfer 
scores in calculating the coefficients of correlation. 

We do not have data from which the necessary means and SD’s 
can be found for groups corresponding in chronological age and IQ 
to each individual in our experimental groups, nor are data available 
from which this information could be obtained. Accordingly, we 
have to use the rather crude method of estimating each child’s transfer 
by finding the difference between his improvement in a test and the 
mean improvement of the proper control group in it. 

The results for the VIIa, VIIIb and VIIIa pupils have been com- 
bined so as to have a larger group in finding the correlations. The 
effect of grade is eliminated by using as raw scores the deviation of 
each pupil’s gain in each test from the mean gain of the control group 
of the same grade in each test. ‘The combined scores on the intelli- 
gence tests were used, effect of grade being eliminated as above, by 
using the means of each experimental group. Pearson coefficients 
of correlation were found for all test results. Since there seems to be 
no transfer in some cases, it is obvious that some of the coefficients 
show the relation between intelligence test scores and practice effect 
in taking the test series, rather than that between intelligence and 





SSS? 








420 The Journal of Educational Psychology 


transfer. Correlations between IQ and transfer were found in similar 
manner, and are shown in Table IV. By dividing the mean of each 
child’s Haggerty and Otis Mental Ages by his chronological age in 
months IQ’s were found. 


TaBLeE I1V.—Prarson COEFFICIENTS OF CORRELATION BETWEEN THE AMOUNT 
OF TRANSFER AND (a) INTELLIGENCE Test Scores (ON HAGGERTY AND 
Otis Group Tests) anp (b) IQ 


























Intelligence test IQ and transfer 
scores and transfer 
le Ee 
n = 36| VIIIa 
n = 44 
Courtis arithmetic 
Addition attempts.................... —.018 | —.081 .070 111 
a, sian ois ene 0688 — .066 .093 .028 . 293 
iinet GRtOMIEE. .... ow once eee cece. .219 .133 .120 .315 
sn a ceeme nee cs b2d | .046 .191 111 . 246 
PR IID... tet c ees  —.113 .180 .021 .372 
ME Sees C4 cs fcc e et east es — .188 .236 | —.141 .318 
Mental division attempts.............. 244 .121 .163 .279 
OS ie ead S wathbban eens oe With . 296 .107 .323 .102 
Number digits right.................. |. pe Brees 2.0: . 265 
Simultaneous adding.................... — .103 .089 | —.086 .039 
Eee ae 165 | —.139 .108 | —.096 
Rh Re a a be dS | —.021 .141 | —.054 . 152 
Le ae | —.145]...... — .166 
French vocabulary.........., dai y ne eel — .011 .161 .078 171 
SESE ee re — .030 
Immediate memory 
ks uP teas s s000 0p |  .121 .079 . 186 .119 
0 Ee Ve —.191 | —.076 | —.096 | —.003 
pS ee ere . 300 . 249 .275 .183 
Error in drawing lines'.................. —.113 .191 | —.048 . 206 
Error bisecting lines'..........:......... —.116 .406 | —.056 .150 

















1 Measuring errors—reverse signs of these coefficients. 


Probably the most significant fact about these coefficients of 
correlation is that neither intelligence test scores nor IQ have any 
close positive relation to the amount of transfer in the tests showing 
transfer, nor to practice effect in taking T.S.I. and T.S.II. in those 
cases where there is no transfer. In Grades VII and VIII the 








~~ w= TH TH — me -- TP 





Transfer of Training 421 


significant correlations are between transfer to mental division and IQ 
(and intelligence test scores), the coefficients being positive and rang- 
ing from .24 to .32. The correlations involving memory span are of 
little significance since so many individuals made exactly the same 
scores in T.S.I. and T.S.II. In Grade VI the brighter pupils made 
slightly greater gains in the arithmetic tests, while those with higher 
mental ages profited least in bisecting lines—made less decrease in 
their errors. 

The correlations between intelligence test scores (and IQ) and the 
combined gains in several tests were found by expressing each indi- 
vidual’s gain in a test as a sigma deviation from the mean gain of 
the control group in that test, and then treating the mean of his sigma 
deviations from the tests in question as his raw score. No significant 
differences in the correlations were found. 

Apparently pupils having higher IQ’s or higher mental ages are 
not the only ones profiting from the training in mental multiplication 
and substitution. Slow and average pupils also gain from it, although 
in some tests they do not gain as much as the former. 

Dividing the experimental groups into quartiles according to 
intelligence and comparing the transfer of each it appears that intel- 
ligence (intelligence-test scores, and 1Q) and transfer of training 
from mental multiplication have a slightly positive relation in the 
case of mental division and words in immediate memory span, a 
slightly negative relation in case of multiplication-rights, simultaneous 
adding, and substitution, and a neutral relation in the other cases; that 
intelligence and transfer from substitution have a low positive relation 
in case of multiplication-attempts -rights, division-attempts, simul- 
taneous adding, and words in memory span, a slightly negative rela- 
tion in case of digits in immediate memory span, and a neutral relation 
in the other cases. 


CONCLUSIONS 


1. Clearly, transfer does take place from a few hours training in 
mental multiplication and substitution. 

2. It is greatest in functions most closely related in content or 
processes, and in functions in which certain emphasized ideals of 
procedure may be used. 

3. The gain from practice does not spread much to functions 
having little in common with the training, and even fails to spread 
from training in substitution to other memory functions. 











422 The Journal of Educational Psychology 


4. We do not know the permanence of transfer in any of our cases, 
nor do we know the relation of such permanence to mental age or 
degree of intelligence, although both of these problems are of great 
importance in the psychology of learning. 

5. We need to know who profit most from any spread of training— 
the pupils having higher IQ’s or higher mental ages, or those whose 
IQ’s or mental ages are not so great. In this investigation mental 
age and IQ are found in most of the tests to bear a low positive or 
negative (really neutral) relation to residual gains or transfer. But 
many of the coefficients of correlation really represent the relation 
between practice in taking tests and intelligence. 

6. Transfer from mental multiplication to mental division is 
somewhat in proportion to both mental age and IQ, but the relation is 
not close, the coefficients ranging from .24 to .32. 

7. The mean of the correlations between intelligence-test scores 
and transfer is .048 for Grade VII and VIII pupils, and .036 for the 
Grade VI pupils; those for IQ and transfer are .069 in the Grades 
VII and VIII, and .114 in Grade VI. 

8. We do not know the relation between intelligence and transfer 
from a longer period of training. The present investigation was not 
continued to the point of interfering with the regular work of the 
classes. Even if longer training had been given by these intensive 
methods, the limits of pupil interest would doubtless soon have been 
reached for the majority of the pupils. uf 

9. If studies of transfer and intelligence are to be conclusive, 
we must have definite, extensive knowledge of practice effect from 
taking tests for different chronological ages and different intellec- 
tual levels. 

REFERENCES 
Judd, C. H.: The Relation of Special Training to General Intelligence. Edu- 

cational Review, Vol. XXXVI, 1908, pp. 28-42. 

Rugg, H. O.: “The Experimental Determination of Mental Discipline in School 

Studies.”” Warwick & York, Baltimore, 1916, p. 132. 


Thorndike, E. L.: Mental Discipline in High School Studies. Journal of Educa- 
tional Psychology, Vol. XV, 1924, pp. 1-24, 83-98. 





= @ wa 


re cs 


—> DM & od 


=) S weet td OO OF Oh, 


| 2D es &* ct rt 





THE ACCURACY OF CERTAIN STANDARD TESTS 
FOR SCHOOL SECTIONING AND MARKING 


PERCIVAL M. SYMONDS 


University of Hawaii 


In a previous paper entitled ‘‘The Accuracy of Certain Standard 
Tests for School Classifications,’ I discussed the accuracy of gross 
scores relative to their use in the classification of pupils in the ele- 
mentary school. In this paper I shall discuss the accuracy of derived 
scores which have been proposed by various workers to make the tests 
more generally useful. The results are based on the same testing pro- 
gram as that described in the paper before mentioned and the reader is 
referred to that paper for the statement of the conditions of the testing. 

The first derived score is the ‘‘age’’ such as mental age, educational 
age, reading age, arithmetic age. An arithmetic age may be defined 
as the age of the average child making the score on the test under con- 
sideration. At least it is by this definition that ‘‘ages’’ are usually 
found. Franzen (1923) proposes that the definition be as follows: 
Arithmetic age is the average age of all children making the score on the 
test under consideration. Even though the latter is definable theoreti- 
cally, most “‘ages’’ are determined on the basis of the first definition. 
Mental age should, strictly speaking, be reserved for the Binet test. 
However, if the test used is stated in each case not much harm is 
done if mental age is used—the case of “ages” derived from group 
intelligence tests. 

In turning scores into ages, I have had to extend artificially the 
norms given in the manual of directions for the National Intelligence 
Tests and I have had to construct out of what data I had age norms for 
the Woody-McCall Mixed Fundamental Tests. The Thorndike- 
McCall manual contains age norms. Following are the accuracy con- 
stants for the ages: 





1 Journal of Educational Research, March, 1924. 
423 








a 


by 
t 


—— . 
Reete pe 
——= 


; ~~ : oe 
oom? a, see” Ss SS 


7 
oad te “—-~ 











424 The Journal of Educational Psychology 





TaBLeE I 
N o ru Cage PEage 
2> ry ls NN hs | = =a 
Months | Months | Months 
NIT A mental age............. 232 23.9 .922 6.7 4.5 
NIT B mental age............. 242 22.4 .949 5.1 3.4 
Th-Mce. reading age............ 232 22.2 .794 10.1 6.8 
Wo-Me. arithmetic age......... 229 25.7 .855 9.8 6.6 




















o age was computed from the formula 0/1 — r;.! 


1 Dr. Kelley has shown in his book “Statistical Methods” that in any par- 
ticular case we may estimate true scores by using the formula Xo = r17X; + 
(1 — riz)M,, and that this estimate has a standard error = oyry+/1 — riz Which 
is less than the standard deviation of the differences between the obtained scores 
and the true scores (X, — X.). We can only estimate these true scores with a 
given set of data with known M and o and the standard error of these estimated 
true scores differs with the heterogeneity of the group. Accordingly, in stating 
the general accuracy of a test score, it seems to me preferable to make this state- 
ment in terms of o1+/1 — rj, the standard error of obtained scores from a true 
score, although in any particular study the true scores of the group may beesti- 
mated with a smaller standard error. 


Table II gives constants for the quotients derived from these tests. 








TaBLe II 
| 
N og Tu F quotient PE quotient 
NSE Po ee IQ | 232 | 13.7 | .900 4.33 2.9 
rks hic suieelandds a IQ | 242 | 12.9 | .856 4.90 3.4 
SPT eee RQ | 232 | 18.8 | .742 7.02 4.7 
Wo-Me................| ArQ | 229 | 14.0 | .789 6.43 4.3 























Table III gives constants for the quotients derived from composites 
and repetitions of these tests. The formula used to compute these 
constants is: 














FQi+ O2+-- + On = V 60," +¢e't+. . - 6a’, 
ad N 
This formula is readily derived through application of the law of error 
FO A+BH+C...N) = Voy? +op?toce?+. . .cyn’ 


where each of the errors combined are uncorrelated. 








42 | 1 NNN NWNNW SY | 


a" @ «¢ 


ee 
_— 


TH i an ~~! a! wee 


ete bet 2c.slUC MhwUlCiC F/T eh Oe 





’ \y Twn Wie weoeeColTCr z s 




















Tests for School Sectioning and Marking 425 
Taste III 
| 
S quotient | PE quotient 

a — - A aie a a 
8 cia Ie i i ie te 1Q 3.27 | 2.21 
Ths + WS le... ees... EQ 4.76 3.21 
ee Ses oar dankdeeres 1Q 3.06 2.06 
iis iid) nae Wane Cietd baer IQ 3.47 2.34 
DE OP Bib ka sacs ccpns cet ckacesss) Se 2.31 1.56 
“Se ere ee ae RQ 4.97 3.35 
SERRE TNS ey 2 ce EQ 4.55 3.07 
2 Th-Mc + Wo-Mce................... | EQ 3.37 | 2.28 








The IQ has been proposed as a unit on which to base sectioning. 
Suppose that two sections have been formed on the basis of the IQ, a 
query that may be raised is how inaccurate is this sectioning—that is, 
what per cent of the total group (grade) has been placed in the wrong 
section. This error in dichotomic sectioning may be determined by 
using the formula r = cos (ru) where u is the proportion of unlike- 
signed pairs, solving for u. r in this case is ~/ry;, 7.e., the correlation 
between the obtained scores and the true scores of the function. Solv- 
ing for wu gives the following per cent error in dichotomic sectioning: 


TABLE IV 
Per Cent 
ERROR IN 
DicHOoTOMIC 
SECTIONING 
SE an ae epee > OSes IQ 19.9 
i ae ie aa ne ee Rae ny Se Se 2 eS bast 1Q 23.4 
se re waa. +s Rae aa aah ed IQ 16,3 
EL OP's ala Gs So oe Crk Sle ce) hes 0 Cea Tee ae eente IQ 14.6 
Shae 5iicien ieuPniekss oo eaeie wire 1Q 17.8 
SR on ccisgets, huis «sane ake 1Q 11.7 


These errors in dichotomic sectioning are not all of the same 
significance. A certain individual, say with an IQ of 110 on a first 
testing might, by chance, have a real IQ less than 100. In this case 
there is a grievous error in sectioning. But, another individual might 
obtain an IQ of 101 on a first testing where his real IQ was 99. Here, 
again, he would be placed in the wrong section if the dividing line was 
100 IQ. Yet this is a very milderror. This latter individual lies so 
near the border line that it makes little difference in which section he 
is placed. Perhaps a more significant statement of the error in section- 








4 
; 
i 








426 The Journal of Educational Psychology 


ing is to state the per cent which are wrongly sectioned by 5 IQ or 
more. By use of Everett’s tables (see Table XXX of Pearson’s Table) 
the following table was derived, showing the per cent wrongly sectioned 
by 5 IQ or more. 


TaBLE V.—PeR Cent WRONGLY SEcTIONED BY 5 IQ orn More 


Per Cent 
Ne eis 6 a cna he Raa ee a4 oe are aw ee IQ 8.2 
es uate de akiaaseurcennes IQ 
os bits gs iss dg'a'd RYDER SS RES DAE E RSENS IQ 5.0 
no) Ue up QW eb sa on oa Geb enc b IQ 4.6 
eid ss ka cw eid aniwewiec edie IQ 6.8 
etn fe hi olor or SESE ES chase oeen a IQ 3.4 


The error in dividing a class of normal spread (¢ = 13 IQ) into two 
sections by means of National Intelligence Scale A may be stated thus: 
20 per cent are in the wrong section or 8 per cent are wrongly sectioned 
to the extent of 5 IQ or more. The results should enable school 
administrators to estimate more closely than heretofore possible the 
accuracy of sectioning by means of standardized tests. Once a test is 
chosen that offers a valid basis on which to section, one may depend on 
the sectioning to be accurate approximately to the extent noted above. 

The accomplishment ratio has been suggested as a unit to be used 
in marking or rating pupils which will indicate achievement relative to 
capacity. One should know accordingly the accuracy of the accom- 
plishment ratio. Chapman (1923) first called attention to the 
relatively low reliability that could be placed on a unit which repre- 
sented the difference between intelligence and educational ratings. 
He says: “It is apparent that the difference in standing in a single 
test of intelligence and a single school achievement test gives almost no 
basis of prediction within a typical grade group of what the differ- 
ence will be when two other similar tests designed to measure the same 
factors are employed.” Kelley (1923) gives still other serviceable 
formulas for determining the significance of differences in intelligence 
and achievement scores. He asserts that he is ‘“‘in the main in agree- 
ment with Chapman’s position.”’ 

Using my same data I have determined the coefficient of reliability 
and the PE of a unit of AR using as usual for making the comparison 
the, first and second scores of the tests in question. The next table 
gives these facts: 








Tests for School Sectioning and Marking 427 
N o Tir CAR PE ,p 
Reading (Th-Mc NIT A) AR..... 226 11.9 .344 9.60 6.47 
(Th-Mc NIT B) AR..... 201 11.7 . 230 10.27 6.92 


Arithmetic (Wo-Mc NIT A) AR.. 224 14.2 .598 9.00 6.06 
(Wo-Mc NIT B) AR.. 196 12.3 -487 8.82 5.95 


As was expected, the reliability coefficient is not high and the PE 
of an accomplishment ratio is considerably higher than for an IQ or 
subject quotient. 


Giving each of the tests twice we obtain the following: 
aR PE 


AR 
2 Reading (Th-Mc—NIT A) AR..................... 6.76 4.56 
2 Reading (Th-Mc—NIT B) AR..................... 7.25 4.88 
2 Arithmetic (Wo-Mc—NIT A) AR.................. 6.36 4.29 
2 Arithmetic (Wo-Mc—NIT B) AR.................. 6.24 4.20 


Suppose that a school decides to issue reports to the parents on the 
basis of the AR using the following system: 
MarkKING ScHEME 1 


Per Cent or Puptrs NorMALLy 
INCLUDED In Eacu Group 


READING ARITHMETIC 
Ta-Mc NITA Wo-Mc NITA 
AR AR 
6. oo kc Wa so vc avi oe Pewee 10 14 
ds a 5 2 Cee aos ioe eae ee 24 22 
I ae SG Oo es ok ee eee 32 28 
DEE es Cbg oa bey «alse ete 24 22 
sos « ow 5 5 oulh dead 0.60 ak eee 10 14 
MARKING SCHEME 2 
GT 0 aness cmap eees eee debiles 5 Ss 
a a 56 xa:k & 40's ba hae Le 15 16 
EG on 5 a a 'd be oy Baeiaied 68 Ot eal 30 26 
eo esas Cou eeNs « <a eee 30 26 
ee nie ea ltd & 0 9 sa 15 16 
I 6 koe b be did ove lees ok eedved'’s 5 S 


Then the various AR’s have a PE equal to the following percentages 


of the letter interval. 
PE ar 
Expressep as FRac- 
TION OF LETTER 
INTERVAL MARKING 
Scaeme 1 or 2 


Reading (Th-Mc—NIT A) AR................ .65 
Reading (Th-Mc—NIT B) AR................ .69 
Arithmetic (Wo-Mc—NIT A) AR............. .61 
Arithmetic (Wo-Mc—NIT B) AR............. .60 
2 Reading (Th-Mc—NIT A) AR.............. .46 
2 Reading (Th-Mc—NIT B) AR.............. .49 
2 Arithmetic (Wo-Mc—NIT A) AR........... .43 


2 Arithmetic (Wo-Mc—NIT B) AR........... .42 

















428 The Journal of Educational Psychology 


This is to be read: Half the pupils will receive a reading AR within 
.65 of the distance between one letter and the next and half the 
pupils will receive an AR more than .65 of the distance between one 
letter and the next. 


Using another marking system we get even better results. 


MarRKING SCHEME 3 
Per Cent or Purrts NORMALLY 
INCLUDED In Eacu Group 
READING ARITHMETIC 
Tu-Mc NIT A Wo-Mc NIT A 
AR AR 


Ne cue weibae an aaeeee 2 5 
Cr 2 ar 23 24 
era nee fey 50 42 
a ed oe 23 24 
ed ac wba ea bine eke eee 2 5 


The table giving the ratio that the PE of an AR bears to a letter 


interval is: 
PE an 


EXPRESSED AS A FRAC- 
TION OF A LETTER 
INTERVAL MARKING 


Scueme 3 
Reading (Th-Mc—NIT A) AR............... .40 
Reading (Th-Mc—NIT B) AR............... .43 
Arithmetic (Wo-Mc—NIT A) AR............ .38 
Arithmetic (Wo-Mc—NIT B) AR............ .37 
2 Reading (Th-Mc—NIT A) AR............. .29 
2 Reading (Th-Mc—NIT B) AR............ 31 
2 Arithmetic (Wo-Mc—NIT A) AR.......... .27 
2 Arithmetic (Wo-Mc—NIT B) AR.......... .26 


This is to be read: Half of the pupils will receive a reading AR 
within .40 of the distance between one letter and the next in marking 
scheme 3, while more than half will receive a reading AR more than .40 
of the distance between one letter and the next. 

In my previous paper I showed that with a single test the Thorndike- 
McCall test has a PE equivalent to .53 of a grade difference and that 
the Woody-McCall test has a PE equivalent to .48 of a grade differ- 
ence. The AR determined from a single giving of these tests has a PE 
of .40 and .38 of the difference between one letter and the next in a 
certain marking scheme. The conclusion is that the AR is more 
accurate a unit for marking than the score is accurate for placing in the 


proper grade. 











Tests for School Sectioning and Marking 429 


Using a method described in the preceding paper, I derive the 
following table: 


TABLE SHOWING THE NUMBER OF REPETITIONS OF THE TEST NECESSARY TO 
Mark A CERTAIN PERCENTAGE OF THE PUPILS OF A SCHOOL ON READING 
wiTH A CerTAIN AccuRACY EXPRESSED AS A FRACTION OF THE DiIFFER- 
ENCE BETWEEN ONE LETTER AND THE NExT (MarkING SCHEME 3), 
Usine THE Reapina AR OBTAINED FROM THE THORNDIKE- 
McCatt Reapine ScaLteE AND NATIONAL INTELLIGENCE 
Test Scare A 





Per cent of distance from one letter to another 
































Per cent 
pupils | | 
1.00 | 75 | 50 25 
| | 
| | 
i 1 ie | 3 
50 | 17 , oo 66 | 2.62 
1 | 1 i. | 8 
75 | 48 | 85 | 1.91 | 7.6 
| 
= i 2 | 3 | 
80 64 | 1.12 2.56 | 
| 1 i 3 4 | 
90 | 97 1.73 3.90 | 
| 2 3 6 | 
95 | 1.38 2.45 | 5.5 | 
| | } 
| 2 4 | 8 | 
98 1.94 2.43 7.75 | 
| 3 5 | 10 | 
99 | 2.38 4.23 | 9.5 | 
| | 








This is to be read: To mark 50 per cent of the pupils of a school 
accurately, to within once the difference between one letter and the 
next (Marking Scheme 3) giving of the test once is sufficient. To 
mark 95 per cent of the pupils to within .50 of the difference between 
letters, giving each test four times is necessary. 


PP ee ar oe oe 
7 


=p 


Sor 


et se is 





os 
a 


. a Pa 
ie ies 





we eee ree 


Pecaes tr _ 
> sifes? 
= 


a es 
oa “OS 


aa 
oa 2 


— 
°* 7SAL. 8 


oe 
> 2 ee 
om « : 


% 
: 





430 


The Journal of Educational Psychology 


TaBLE SHOWING THE NUMBER OF REPETITIONS OF THE TESTS NECESSARY TO 
Mark A CERTAIN PERCENTAGE OF THE PupPILs oF A SCHOOL IN ARITHMETIC 
WITH A CERTAIN ACCURACY EXPRESSED AS A FRACTION OF THE DIFFER- 
ENCE BETWEEN ONE LETTER AND THE Next (MARKING SCHEME 3), 
UsInG THE Reapina AR OBTAINED FROM THE Woopy-McCa.u 
FUNDAMENTALS TEST AND THE NATIONAL INTELLIGENCE 


Test ScaLe A 





} 


Per cent of distance from one letter to another 





























Percent | faa 
pupils 
1.00 .75 .50 .25 
1 1 1 3 
50 .14 .25 .56 2.25 
1 1 2 7 
75 .42 .73 1.63 6.5 
1 1 3 
80 .52 .90 2.03 
1 2 4 
90 .85 1.48 3.34 
2 3 5 
95 1.18 2.10 4.7 
2 3 7 
98 1.66 2.96 6.6 
3 4 
99 2.04 3.63 

















The reliability coefficients of the arithmetic AR (.60 and .49) are 
comparable with reliability coefficients of school marks obtained in 
the customary way. Repeating the tests gives reliability coefficients 
(using the Spearman-Brown Formula) of .750 and .655, the former 
being better than that usually obtained with ordinary school marks. 
That is, on the basis of two 35-minute intelligence tests and two 20- 
minute arithmetic tests and with less effort than is usually expended 
on scoring ordinary school examinations an AR which may be used as a 
school mark is obtained, which is more reliable than the ordinary school 


mark. 








ee ae a a ee Se le eS we 





Tests for School Sectioning and Marking 431 


The difference between the reliability coefficients of the AR 
(.34, .23, .60 and .49) may be entirely due to chance differences but it is 
probable other factors have made these differences also. The probable 
errors of these coefficients (.039, .047, .029, and .039) would indicate 
that the differences between the coefficients are not wholly chance 
differences. One would expect that the community of material would 
also make differences. That the National Intelligence tests put more 
of a premium on reading than arithmetic would explain the generally 
lower reliability of the reading AR. That National Intelligence B 
contains a test practically identical with the Woody-McCall Mixed 
Fundamentals in nature while National Intelligence A does not would 
lead one to expect the arithmetic AR obtained from scale B to have a 
lower reliability than that from Scale A. This makes concrete the 
criticism that the AR obtained from verbal group tests of intelligence 
is not valid. In so far as the community of material between the 
intelligence test and the achievement test approaches identity, the 
reliability of the AR approaches zero. This would bring us back 
to Pintner’s original contention that to compare achievement with 
intelligence the material used to measure intelligence should be quite 
different from the achievement material. Unfortunately, non-verbal 
material does not show the high correlation with intelligence that 
verbal material does. We seemed reduced to a dilemma—the matter 
which best measures difference in intelligence is the matter that we 
most highly prize for achievement, while material which is not valued 
for achievement does not measure so well differences in intelligence. 
At present, the only solution I see is that of falling back on the individ- 
ual test where the avenue of approach-—oral communication—is 
more basic and less dependent on skill than reading. Presumably, we 
get our nearest reach to innate intelligence by using material of most 
widespread familiarity that makes the least demand on skill either on 
the receptive or the response side. 

The validity of the AR Las been discussed by Toops and Symonds 
(1922, 1923). Whatever its defects, it does have certain advantages 
as a school mark, theoretically measuring as it does with certain quali- 
fications the motivating effect of the school and home environment. 
Direct measures of educational achievement may be considered a 
resultant of environment acting on original capacity and as such is a 
valuable marking unit—the EA is a thing that men pay for, but the 
AR, theoretically, is the isolated effect of the educational environ- 
ment in stimulating achievement and as such is also a valuable mark- 








a +c wes 
oa ar as . 
Re Sie ewe _—- 
ort vi xe 
at ei ah pt 


a aie 7 
= LD 
eer rr 


sae ke he 


-~ oneg ot tee ~ 





432 The Journal of Educational Psychology 


ing unit. The ARisameasure of the stimulating influence of the teacher, 
school and home. Whatever the decision as to its validity as a school 
mark, one may safely use it with the confidence that the arithmetic AR 
at least is as reliable a measure as an ordinary school mark. 

Conclusion.—This paper presents certain facts concerning the 
reliability of certain derived measures used to section and give school 
marks obtained from educational and mental tests. The reader 
is referred to the body of the paper for these facts. ‘Two conclusions, 
however need to be emphasized: 

1. The arithmetic Accomplishment Ratio obtained from single tests 
as a measure to be used for school marks is of the same order of reliability 
as ordinary school marks. 

2. The Accomplishment Ratio is more acurate as it may be used as a 
school mark than the test scores themselves are accurate for the pur- 
poses of school grading or classification. A school principal may 
more safely use the AR for school marking from a single giving of a 
reading or arithmetic test and an intelligence test, than he may use 
the scores from a single reading or arithmetic test to grade his pupils. 


REFERENCES 


Chapman, J. C.: (1923) The Unreliability of the Difference between Intelli- 
gence and Educational Ratings. Journal of Educational Psychology, Vol. 
XIV: 2, Feb., 1923, pp. 103-108. 

Franzen, R.: (1923) “‘The Accomplishment Ratio.” 

Kelley, T. L.: (1923) A New Method for Determining the Significance of Differ- 
ences in Intelligence and Achievement Scores. Journal of Educational 
Psychology, Vol. XIV, Sept., 1923, pp. 321-333. 

Toops and Symonds: (1922-1923) What Shall We Expect of the AQ? Journal 
of Educational Psychology, Vol. XIII: 9; and Vol. XIV: 1, Dec., 1922, pp. 
513-528; Jan., 1923, pp. 27-38. 








— = ££. 


o> FR rh 


a ee ee ee ee ee 


i fin Ml * 





SPURIOUS CORRELATION AND RELATIONSHIP 
BETWEEN TESTS 


GODFREY H. THOMSON AND RUDOLF PINTNER 
Teachers College, Columbia University 


1. Object of the Paper.—The increase in the number of intelligence 
tests has naturally led to frequent comparisons between tests, stated 
in the form of correlation coefficients. Any number of such coeffi- 
cients can be found in the literature, and they are not infrequently 
regarded as more or less exact measures of the relationships existing 
between tests, without further inquiry. 

As has been pointed out by several writers, however, it is a grave 
error to assume that a high coefficient indicates a high degree of rela- 
tionship unless attention is paid to the number of cases involved, the 
spread in chronological age of the subjects examined, or the homo- 
geneity in mental ability of the group, all of which factors may exert a 
profound influence on the correlation coefficient and may entirely 
vitiate direct comparison between coefficients. 

The particular object of the present paper is to examine and eluci- 
date the manner in which two other forms of error may creep into 
work involving intelligence quotients and mental ages. These particu- 
lar errors are not unconnected with the factors mentioned in the last 
paragraph, with which we shall therefore also be concerned. The 
first of these errors is due to spurious index correlation, the second to 
the influence of the correlation between chronological and mental age. 

The thesis which we shall propose and endeavor to support is 
that the only safe measure for correlational comparison of tests is 
the correlation of mental ages for constazit chronological age or, what 
is identical with this, of intelligence quotients for constant chronological 
age. As a concrete example of spurious index correlation between 
IQ’s consider either of the following, where the tests concerned are 
the National Intelligence Test and the Pintner Non-language Test.! 














School 12 
School 16 
Grades III and Grade V A 
IV 
Correlation of MA’s......... a a oe Pe .29 | .37 
EE MR no denen bposanne een 54 | 64 
hate 9 SIR Pay ey Sen ree Dee | 286 81 





1This will be called throughout Pintner’s example. 
433 





is 
i 
f 
‘* 








434 The Journal of Educational Psychology 


Here the rise in correlation from about .3 to about .6 is entirely 

a spurious index boosting; but the information as here given does 
not enable the reader to know this, and had only the IQ correlations 
been presented they might well have passed muster as correct. Their 
“probable errors”’ are only .03 and .04 respectively. 
_ Index correlation in its simplest form is best seen in the instance 
used by Sir Francis Galton to explain it, in the Proc. Roy. Soc., 
London, 1897. To a modification of that instance, and some exten- 
sions of it made for our present purpose, we now turn. 

2. Spurious Index Correlation.—Let us suppose that a test is so 
bad that it is, within given limits, entirely a matter of chance what 
mental age it assigns to a subject and that mental ages thus assigned 
are therefore distributed entirely at random, without any association 
whatever with the intelligence or with the chronological age of the 
pupil. 

Two such tests, each assigning mental ages at random, will of 
course show no correlation if mental ages are employed. But if IQ’s 
are taken, they will show a considerable correlation. To make this 
clear take Galton’s example with our new nomenclature, and with the 
numbers modified to look like possible mental and chronological ages. 
Let the correlation table for a certain set of children’s MA’s by these 
two utterly bad tests be as follows, showing no correlation: 


GALTON’s EXAMPLE 





First mental age 














> | Years 8 9 10 | Totals 

~ = wes = aa aibanioe 
S| 8 4 8 4 16 

g 9 8 16 8 32 

ber 10 4 8 4 16 

=) 

° 

3 16 32 16 64 




















Let the distribution of the actual ages of the children be also over 
8, 9 and 10 years in the proportions 1:2:1, and let there be no correla- 
tion between these actual ages and either of the above mental ages. 
It will be admitted that neither of these hypothetical tests measures 
anything of any use whatever, and that they have no organic con- 
nection with one another. But if we form the I1Q’s we get a correla- 
tion between them of r = .5. 








Spurious Correlation and Relationship 435 


For since there is no correlation between the chronological age and 
the mental age, the 10 year old children, who form one quarter of the 
whole, will have IQ’s distributed thus: 


GALTON’s EXAMPLE 














First IQ 
80 90 100 
o 
_ 
= 80 1 2 1 
§ 90 2 4 2 
100 1 2 1 

















Similarly the nine year old children who form one half of the 


whole, give: 
GaLtTon’s EXAMPLE 



































First IQ 
89 100 111 
g ees pespmnennneesn — —_ a 
> 89 2 4 2 
8 100 4 . 4 
R ill 2 4 2 
And the eight year old children, 16 in number, give: 
GaLTon’s EXAMPLE 
First 1Q 
100 112% 125 
oe 
— 
E 100 1 2 1 
9 112% 2 4 2 
a 125 1 2 1 

















No one of these correlation tables shows any correlation when 
taken alone—the correlation of 1Q’s for constant age contains no 
spurious index factor. But when the 1Q’s of all 64 children are entered 
upon one form, we get the following table which, as actual calculation 
will show, gives r = .5. 





oe 


act 


fle 
WS 








436 The Journal of Educational Psychology 


GALTON’S EXAMPLE 























First IQ 
205 nlliitliptian pans ) ‘ a 
| 80 | 89 | 90 | 100 | 111 | 1123 | 125 
* meee Ao Bae ee Ae 

| | | | 
80 1 7 2 i ie 4 
@| 89 Tc alk oe ce 4} 2 8 
S| 100 ih wee Se eo: oR 2 1 24 
&| 111 ee eee we See Sa Cee 7 s 
112% ot! ae aoa 2 | 4 2 8 
125 1 . 1 4 
Totals...| 4 8 | 8 24 | 8 8 4 64 

| 























In this instance, therefore, where the tests are not correlated at 
all, the IQ’s are correlated .5. 
If we let 
a = chronological age 
m, = child’s mental age by Test 1 
mz = child’s mental age by Test 2 
qi = child’s IQ by Test 1 
gz = child’s IQ by Test 2 
then the formula giving the spurious correlation which may arise 
between the IQ’s even if the mental ages are quite uncorrelated! is: 


, 2 
Va 
Tqlq2 = Sa 7 = 
V (v2 mi + Va) (U%m2 + Va’) 
: Co 
wherein a 
mean age a 
Tml1 
MEAN m1 


Um1 = 


Tm2 


mean m2 


Um2 = 


i.e., the v’s are the variability coefficients of the chronological and the 
two mental ages. 

Clearly if ve = Umi = Um2 a8 here, then Tqig2 = De 

If the three variables a, m; and mz are also really and organically 
correlated, that is the mental ages correlated,with one another_and 





1 Pearson, K.: Proc. Roy. Soc., Lond., 1897, pp. 489-498. 








Spurious Correlation and Relationship 437 


with chronological age, then the formula is more complicated. (See 
Pearson, ibid.) 


r sie Tmim20m1Um2 — TmiaVmiVa — Tm2aUm2Va + v.* a 
WW {(v8m1 + Va? — WmiadmiVa) (V%m2 + Va? — 2moaVm2Va)} 
This last formula gives the correlation which will be found between 
the 1Q’s when the real correlation between the MA’s is fmimo. The 
excess Of Tg1g2 OVET 7mim2 18 in this case due to index boosting. If the 
MA is positively correlated with CA the boosting will not be so serious 
as in the instance worked out. If MA is negatively correlated with 
CA the boosting will be more serious. Each of these case may be 
realized in a group of children; and in any case, except by a miracle, 
Taq, the correlation between IQ’s will not be the same as fam, the 
correlation between the MA’s. As an illustration of these principles 
let us turn again to the examples already mentioned from Schools 
12 and 16. 
3. A School Example in Detail—The following table gives the 
full set of correlations in the experiments already partly quoted: 








TABLE I 


Pintner’s Example 


ScHoo. 12 ScHOOL 16 
r r 

ee ee ee a a ee eo en oe 2 ee ee + .29 + 37 
SE . Se e Pee Sa Eee eee — .03 — .22 
rere Tees eee ee ee ee ae ee eee — .08 os .06 
je ee a ee ee ee ee eo ee ee + .29 + .39 
Ut) | Ce a + . 54 + .64 
Ur) ee — .61 — .775 
Tq2a Per ea ee ee ee ee eee oe  . 56 ~~, 55 
MR sic cx vn s. ta Celie bean Mees «4c +.30 + .405 
EES Pe Pee Pee ee 286 81 
eh ak hrs a bata uh eae ee III and IV IVA 

m, = Mental Age on National Intelligence Test. 

mz = Mental Age on Non-language Test. 


qi = Intelligence Quotient on National Intelligence Test. 
gz = 1Q on Non-language Test. 
a = Chronological Age. 


The correlations here given between IQ’s are those actually worked 
out from the data. We can use them to illustrate our Formula 2, 
for which purpose we need, in addition, the values of the sigmas and 











438 The Journal of Educational Psychology 


the mean values of the mental and chronological ages. For School 
16 these are (using bar for mean) 


mi = 8.65 O2m1 = .864 
me = 9.19 O'me = 2.185 
a = 10.45 o,? = 2.012 


and the Formula 2 gives rgige = .68, which agrees with the value 
actually found of .64 well enough, since only approximate calculation 
by slide rule has been employed, and grouping errors also come in. 
For School 12 a similar result is obtained. 

It will be seen from a study of Table I that within the groups 
here used, there was hardly any correlation between MA and CA. 
It is for this reason that the value of r,, is so much higher than rmmm. 
For the same reason, the correlations of MA’s for constant CA do not 
differ much from the raw correlations, Tm,m,-2 does not here differ 
much from fmm, In this instance the correlation of MA’s is real 
and that of IQ’s spurious, and when 14,9,.2, correlation of IQ’s for 
constant age, is formed it brings the correlation down from the spuri- 
ous value to the same as ’'mm.-a (the slight differences in the table 
being attributable to approximations of calculation by grouping or 
using slide rule). In other instances r,, might be nearer the true value 
than 7mm, and unless the correlations of MA and IQ with age are 
given, one cannot tell which is more nearly correct. To balance the 
above example we might turn to cases where the conditions are just 
the opposite. . 

4. Spurious Mutual Correlation with Age.—lIt is quite possible for 
two tests to have no organic connection with one another, and yet for 
the MA found by either to correlate highly with chronological age up 
to even .7 in extreme cases. In such a case there might be no correla- 
tion between I1Q’s and yet, if the cases were spread well over a long 
range of chronological age, there might be a very high correlation of 
MA’s between the tests. 

As a small simple hypothetical case suppose five children aged 
respectively 4, 6, 8, 10 and 12 years have mental ages, by two tests, 
as given in this table: 


CA MA By Test 1 MA By Tesr 2 
NE Ns 55.09 05562 40 sen so dn $04 oS 3.2 4.0 
ET ain rad ee kA AS ke eae 6.6 6.6 
ECG 5 it's 4 44 4 Voss 8 b0e sean 9.6 7.2 
RS a re wee 10.0 12.0 
A 54 Se kgs de OW iiss edb vada’ 10.8 9.6 








Spurious Correlation and Relationship 439 
The correlation rmm,= .85. But consider the table of IQ’s: 
IQ sy Test 1 IQ By Test 2 
80 100 
110 110 
120 90 
100 120 
90 80 


The correlation r,,¢,= .10 

Which of these, rmim, OT T¢,¢,, gives the true picture of the organic 
connection between the tests? 

Very little consideration is needed to convince one that on this 
occasion it is mm, Which is spurious, as a measure of the relationship 
between the tests, while r,,,, correctly portrays their real association, 
more or less. This example of only five children is, of course, exceed- 
ingly artificial. No one would dream of quoting correlations from it, 
and moreover the reader may protest that the trick whereby a false 
Tmm, iS procured is too obvious to occur in actual cases, or to deceive 
anyone. 

But with a little care, examples can be made to simulate real cases 
very closely. Here is a correlation table between two mental ages 
Giving Tm,m,= 0.80 


THomson’s First EXAMPLE, Ta,m, = 0.80 





Mental age—Test 1 


























| 
| 7 8 9 10 | 11 12 | Totals 
| iad 
7 | 4 at 8 years EEE ES ee Se Se a ae ee 8 
a! 8 12 at 9 12 at 9 
2 4at8 8 at 8 Qats | ccc terres tere 40 
| a ee {me ety inne SOLD Tee ew LPR 80 
4 4at8 sme 12 at 9 | 
mob eee Cerner 12 at 10 4 at 11 oe 2 eee 80 
2 12 at 9 24 at 10 12 at 10 
= } 12 at 9 | 
™ 4 at 11 el eae | on 
eseCeeeeebese 1eeedeb26e*e oeOese GF eeeeesene . 12 at 10 12 at 10 4 at 11) 
OS es ae eae a ee. Seer oo. ees 4atl1l | 4at 11) Ss 
j 
Totals.| 8 | 40 80 80 40 8 | 256 
| 

















There is nothing suspicious or alarming about these data. The 
correlation of the two mental ages might be given in perfectly good 





j 
' 


-~ 
. ee 
ee ge 











440 The Journal of Educational Psychology 


faith in comparing the two tests. Form now the correlation table of 
IQ’s. To enable this to be done the chronological age of each child 
is given in the body of the preceding table. Taking them for con- 
venience at their exact birthdays, we get a table which condenses 
as follows: 


TuHomson’s First EXAMPLE 


q1 








Below 91 | 91 to 109 | Above 109 























<a 32 32 ae 64 
eR 32 64 32 128 
MieeOOD. es Seek. 7 mes ome 64 

| 64 128 | 64 | 256 





The correlation r,,,, from this or from the complete table is found 
to be 0.50. 

Now which of these, 0.8 or 0.5, is to be taken as the proper measure 
of resemblance of Tests 1 and 2. We cannot tell without knowing 
the correlations of m; and mz with a, and correlations of q: and q»2 
with a. The table of m, with a is as follows: 


THOMSON’s First EXAMPLE 





Chronological age—a 


























8 9 10 | 11 | 
f 7 8 - g 
4/ 8 16 | 24 S 40 
| 9 8 48 24 me | 80 
S| 10 24 48 s | 80 
S| 11 24 16 | 40 
=| 12 8 | 8 
| 
32 96 96 32 | 256 





Giving o, ae, Cn’ = = and fam = ve = 0.774 











Spurious Correlation and Relationship 44] 


The correlation table of a and mz shows a similar distribution and gives 


V5 


_— 
tom, = 15 _ 0.774 


From these we find that rm.m,.c (the quantity which, we are urging, 
should form the sole basis of comparison of tests) is 


8 — .774 X .774 


V/ (1 — 7742) V0 — 774) 


In this instance, therefore, the real connection is shown correctly 
by Tq, Which was equal to 0.5. If this reasoning is correct, 19,9. 
should not be different from the full correlation r4,¢,. 

To test this form the tables for q; and a and for gz anda. Each 
of them has the following appearance: 





Tmymya 


THOMSON’sS First EXAMPLE 





Chronological age a 








g 9 10 1 | 
: | | vee related 
S as 87 = 91 8 24 24 8 | 64 
e 16 48 48 16 128 
= 109 oy 113 8 24 24 ~ 64 
32 96 96 32 256 




















Showing 12,2 =0 as is also fg, whence fg,¢.a = Teq, = 0.5. 

5. Index Correlation between Age and Intelligence Quotient.—It is 
important to examine the machinery by which one of us, in making 
the above artificial example, ensured that the r,, would be exactly 
right. This might be done by working from a table of age and intel- 
ligence quotient which shows no correlation, but it is more instructive 
to look at the situation as follows. Just as spurious index correlation 
may arise between two intelligence quotients, so spurious correlation 
may arise between one intelligence quotient and age. 

From the same paper of Pearson’s whence we have deduced the 
formulas already used in this paper, it can be shown that if a, m, and 





+ 
4 
' 


) 








442 The Journal of Educational Psychology 


q stand for chronological age, mental age and intelligence quotient, 
then 


en TamUm — Va a 
V (Vm? + Va" a: 27 am am) 
where the v’s are variability coefficients. From this formula it is 


clear that if intelligence quotient q is to be uncorrelated with age a, 
we must have 





Taq 





and if age and mental age have the same mean this becomes 
Ta 


om 


Tam 


Now it is our hope and belief, as workers with intelligence tests, 
that intelligence quotient is uncorrelated with chronological age, 
that intelligence quotient remains more or less constant throughout 
life. If this be so, then the correlation between age and mental 


age will be the ratio * given above: and in any sample of the popula- 


tion which agrees with the non-correlation of quotient and age, 
the variability of chronological age will be less than that of mental 
age in the proportion ram. Thomson’s first example already given 
shows this, as will be seen from the a and m, table. The children 
actually ranged from 8 to 11 years, but mentally from 7 to 12. In 
that case the correlation between intelligence quotients was a true 
measure of the resemblance between the two tests; that between 
mental ages a spurious one: whereas the opposite was the case in 
Pintner’s example in Sec. 3. It would seem from the considerations 
of this section that one criterion for the unselected nature of any 
group is that 

Va 


Um 


Tam ’ 


that is, the coefficient of correlation of age and mental age in such a 
group should be approximately equal to the ratio of the coefficients 
of variability of chronological and mental ages. 

6. True Correlation Differing from Both Correlation of MA’s and 
Correlation of IQ’s.—In Pintner’s example the correlation of mental 
ages was true; that of intelligence quotients was spurious. In Thom- 
son’s first example already given the reverse was the case. It can 
easily happen, however, that neither of these values is a true correla- 





Qo oO — co 


Spurious Correlation and Relationship 443 


tion. And furthermore, the true value need not lie between them 
(although it probably will do so as a rule). The following artificial 
example (Thomson’s second example) shows a case of the value 
Of Tmim2-a OF Tgig2-a being less than both entire values. 

Let 256 children be distributed in mental age m, by one test and 
in mental age mz by another test as shown in the accompanying cor- 
relation table, while their chronological ages are as indicated in the 
body of the table: 

THomson’s SEconD EXAMPLE 


















































Mental age ™ 
| 8 9 10 11 years 
8 Sat 8 SE FR en Se 39 
8 at 9 ee oe 
‘a 8 at 8 8 at 8 
S 9 8 at 9 24 at 9 ee ae 96 
Py 16 at 10 16 at 10 
yer aap 16 at 9 16 at 9 
= Ul lh 16 at 10 24 at 10 8 at 10 96 
8 at 11 8 at 11 
ae Ge error 8 at 10 8 at 10 
i 8 at 11 8 at 11 32 
32 96 96 32 256 
This gives us 


Tm ma = 0.67 


Ta” = Om,” = Om,” = 0.75 years 


Tam, = Tem, = 


Taq, = Tag, = — 0.41 
Tq,aq = 9.5 
Is Tam, = 0.67 oF 1¢,¢, = 0.5 the true measure of correlation of Tests 1 
and 2? The measure which we are advocating is 
Tmymg.0 (OF Ta;¢9.0) = 0.4 


So that in this case both the ordinary measures are, in our opinion, 
boosted; the one by spurious index correlation, the other by spurious 
correlation with chronological age. 


— 


' 


' 
» 
t 





; 
| 


oOo .er eh ee. 





wer <_ 


444 The Journal of Educational Psychology 


7. Some Practical Questions in Which These Principles are Impor- 
tant.—It seems to us not at all unlikely that by neglect of the above 
principles erroneous deductions may, in the past, have been drawn in 
the matters such as the following: (1) The degree of resemblance of 
two tests. (2) Reliability coefficients. (3) The constancy of the 
IQ as judged by correlations between IQ’s on the same test at different 
times. (4) Correlation of IQ with achievement quotient where, if 
chronological age, mental age, and performance age are all totally 
uncorrelated, there will be a spurious negative correlation equal to 

Cu* 
~ VA{ (Um? + 04?) (Vm? + 09?) 
where the v’s are the variability coefficients (¢ + mean) of the three 
quantities, m mental age, a chronological age and p performance age. 
This fraction will rise to —0.5 if the v’s are all equal. 

All these dangers would disappear if in each case the partial correla- 
tion coefficient for constant age were used, except in the last instance, 
where it is mental age which is common to the two quotients and which 
must be “‘partialled out.’”’ The still broader caution is, that anyone 


reporting on tests should give all correlations of all quantities with 
one another. 











“i ~~ hCUrKh OR OF 


or 


- meet « a — 





ON SCORING MULTIPLE RESPONSE TESTS 


KARL J. HOLZINGER 
University of Chicago 


In tests where the items may be marked “true” or “‘false”’ it is 
customary to score the result “right minus wrong,” or in symbols, 
S=R-—W. This method is assumed to correct for the element 
of guessing which may be involved. If a person guesses on eight 
such items he would be expected to get half of them right by chance, 
so that his score should be zero and not four. By scoring R — W this 
correction is evidently brought about, but it should be noted that 
this person could miss four out of the eight items and yet do no guess- 
ing at all. He would then be unjustly penalized by the formula 
which implies that any error arises from two guesses. 

If three responses are open to choice in answering each question, 
the above formula becomes S = R —14W, and for four responses, 
S =R-—4W. The most general form for n responses is, by induc- 
tion, 

ink ~ et (1) 
(n — 1) 
It is the purpose of this note to point out that no such correction for 
guessing is necessary when the tests are administered so that all 
pupils may finish. : 

For convenience in the following proof, Equation (1) may by writ- 

ten in the form, 


S=R+CW (2) 
If the number of questions or items in the test be denoted by A, 
this will also be the number of attempts for each pupil provided all 
are allowed to finish and to try each item. Evidently R+ W =A 
or, W =A-—R. Substituting this expression in (2) and remember- 
ing that A is a constant, gives S = R+ C(A — R) = aR + b where 
a and b are constants. Thus no matter what value be assigned to C, 
the scoring Formula 1 or 2 becomes a linear function of R alone. 
In order to see whether or not it is profitable to use Formula 1 
in scoring multiple response tests, we shall obtain the correlation 
between score by the formula and score by rights alone. The general 
expression for such correlation will be rse = rear) = + 1.00. 
This theorem is obvious by inspection, but for those unfamiliar with 
mathematical statistics a proof will be given. 
445 


Sf ERs. 
— 


i <a am 


joereee Sr Tt 


Age pe pe meta ~ Shy 





Seebaachhimiae Stat tek aad 





; 
if 
4 
"4 
: 





446 The Journal of Educational Psychology 


Suppose there are N pupils whose papers are scored by both 
methods. The pairs of scores for correlation may then be set down 
as follows: 


Scorpw BY 





Pupi. Scores By (1) Rieuts ALONE 
1 Si = aR, —- b R; 
2 Ss = ak; a b R: 
3 S; = aR; +56 Rs 
N Sy = aRy + 5b Ry 
I ii ditins ssck sabicee’ =S = azR + Xd =R 


Since Ms = 78,Ma= 77 and 2b = Nb, we may write Ms; = 
aM R+- 

If now s = S — Mgandr = R — Mz (i.e., deviations from means) 
it is evident that s = S — Ms = aR + b — (aMgis) = a(R — Me) = 
ar. 

The required correlation then becomes rgzg = Tar.r. The product 
a rry 
VS S2? dy? 
where the variables are deviations from the respective means. Set- 
ting x = ar and y = r we have 

ta Zar.r uss azr? 
torr /Sa%dr? azr 

A numerical example will illustrate the foregoing proof. Suppose 


there are twenty questions in a true-false test and that five pupils 
scored as follows: 





moment formula for correlation may be written rz, 





= + 1.00 








Porm S=R-W R 8 r 8? r? sr 
1 0 10 -8 -—4 64 16 32 
2 20 20 12 6 144 36 72 
3 10 15 2 1 4 1 2 
4 0 10 -8 -4 64 16 32 
5 10 15 2 1 4 1 2 
Ms=8 Mre=14 Zs? = 280 =r? = 70 Zsr = 140 


Then Tse = + 1.00. 

The above results show that if all pupils are allowed to finish a 
multiple response test, the simple score of rights alone is perfectly 
correlated with that obtained by the Formula S = R+CW. The 








lal 


~~ eat hm F. 





Scoring Multiple Response Tests 447 


relative standing of the pupils is identical by the two methods of 
scoring, and where such standing is desired it is obviously better to 
use the simple formula S = R. 

In case the test is timed so that all do not finish, the above results 
do not hold. Thurstone' has proposed a method for determining a 
suitable value to be assigned to C, but the procedure involves giving 
the test to a fairly large group before the proper formula can be 
derived and the subsequent scoring is troublesome. 

By allowing all pupils to finish, each is given a chance at all of the 
problems and the scoring reduced to the simplest possible terms, 
i.e.,S = R. 


1Thurstone, L. L.: A Method [for Scoring Tests. Psychology Bulletin, 
Vol. XVI, No. 7, July, 1919. 








e~so 


Pe 
- 
f 


ar ee OR ee” 


Oe 

















A STUDY OF THE PREDICTIVE VALUE OF 
CERTAIN KINDS OF SCORES IN 
INTELLIGENCE TESTS 


WILLIAM M. BROWN 


Washington and Lee University 


The history of the origin and development of educational tests 
would seem to indicate among other things that, while there is no 
hard and fast distinction made by educators between intelligence 
and character as such, common parlance sanctions the use of the two 
terms to designate aspects of the personality which differ from each 
other at least in some degree. And, while we have many tests which 
claim to measure intelligence in one or another of its phases, character 
seems to present a less tangible field for experimental study. There 
are, therefore, correspondingly fewer tests for the measurement of 
character traits. Some of these have met with a considerable degree 
of success, but it cannot be said that any one of them is wholly satis- 
factory for the purposes for which it is intended. 

The important point to note in all this, however, is the fact that 
an individual’s performance on an intelligence test, as well as elsewhere, 
practically always involves the exercise at the same time of certain 
character traits and, conversely, certain intellectual factors must 
almost inevitably play some part in the performance on any test of 
character traits. Hence, it is reasonable to assume that every test, 
whether it is intended primarily to test character or intellectual 
factors, at the same time presents opportunities to the individual to 
demonstrate the presence or absence of some one or more factors 
belonging to the other class. The greatest difficulty here consists, of 
course, in finding objective evidence of traits which are not specifically 
tested by the examination in question. 

In the study here reported the attempt has been made to evaluate 
the various types of scores to be found in the Thorndike intelligence 
examination and to examine each type as a possible indicator of some 
more or less well recognized character trait. 


DESCRIPTION OF THE THORNDIKE INTELLIGENCE EXAMINATION 


The test as devised by Professor Thorndike and first used in 1919 
was entitled: ‘‘The Thorndike Intelligence Examination for High 
448 








Predictive Value of Scores in Tests 449 


School Graduates.”! It was divided into three parts, designated 
respectively Part I, Part II, and Part III. Of these, Part I was 
largely a speed test and was further subdivided into two sub-tests 
which are similar and therefore comparable throughout. Parts II 
and III, on the other hand, were intended to test the judgment, 
ingenuity, comprehension, and ability for discrimination of the sub- 
ject, less emphasis being placed on speed. The general character of 
each test included in the examination may be seen from Table I, 
which also includes the number of items or “elements” composing 
the test. 


TaBLE I.—SHOWING THE CHARACTER AND NUMBER OF ITEMS INCLUDED IN THE 
THORNDIKE INTELLIGENCE EXAMINATION, EDITION oF 1919 


Part I, Forms 1 and 2 








No. of test Description Total number 
of items 

1 Following directions...................... 5 
2 — RS Ee Se Pe 10 
3 RS ELIS AT AED 8 
4 as es wd gable 60a dees 10 
5 NI eg On a 10 
6 Synonyms and antonyms.................. 20 
7 SR te) ee ee 5 
& 3, vs tes eee Lk hadae ot 10 
9 RS 55. ids a sip boteicib bee do 0 mise 20 
10 SERS STE SS ae ner Be a 9 
11 - 0 a are ee ee ee 8 
12 NN eo oh pe oe ee ay sine bb's 8 
13 Identification (recognition)................ 20 
EE een eae eee a oes 143 
I en als wa wncs 286 











1 For a more complete description of the Thorndike Intelligence Examination 
the reader is referred to ‘‘ Measurement in Higher Education,’ by Dr. B. D. Wood, 
Yonkers, 1923. 





: 
t 





: 


Bae 


450 The Journal of Educational Psychology 


TaBLe I.—Continued 























Part IT 
No, of test Description Total number 
of items 
la Comprehension (reading).................. 6 
1b Comprehension (reading).................. 6 
2 Sentence completion...................... 12 
3 Object completion........................ 10 
4 Mixed relations.......... abAK RR eeat tame es 8 
5 bind ants 55's 6 9:0 4404 awl 10 
6 ie hain ening cle minis dies 6 
7 Mechanical information................... 10 
8 True-false (information)................... 60 
I hs cso. 2k a tle Ane ek i a cal 128 
Part III 
No. of test Description Total number 
of items 
la Comprehension (reading).................. 6 
lb Comprehension (reading)............ Pre Pe 6 
2 Sentence completion....................0.. 12 
ind case) xavudemkscniiiiatnds tmekiee 24 
Total for entire examination............. 438 











ScORING THE EXAMINATION 


The method of scoring the items in the Thorndike examination 
is somewhat complicated as compared to that used for many intel- 
ligence tests. Each item is properly weighted by being given an 
assigned value, which is proportionate to the importance and diffi- 
culty of the item itself. In most instances part credit is given where 
the answer is only partially correct, though this is not done for any 
of the tests of Part I. In some tests minus and zero scores have 
been introduced, while in others only the items done correctly are 
considered. ‘The situations involved in these two cases are quite 
different, however. In the tests where no deduction is made for 
inaccuracies the subject is merely given directions as to what he is to 
do;sometimes guessing is encouraged by the assurance that no penalties 








—s —_—_ tte Aan» — th ~~ rh rh — THD 





Predictive Value of Scores in Tests 451 


will be inflicted for wrong answers. Where minus and zero scores 
are given, on the other hand, the subject is not only told what he is 
expected to do but he is warned that wrong answers will cause a 
deduction from his total score. He must, therefore, choose between 
putting down answers of whose correctness he is quite sure and, in 
addition, either guessing at other items or leaving them blank. Table 
II shows the assigned value of each item included in the examination. 


TasLeE II.—SuHowineG THE Maximum Scores Wuicu May Be OBTAINED on Eacu 
ITEM OF THE THORNDIKE INTELLIGENCE EXAMINATION. WHERE No VALUE 
Is ENTERED IN THE COLUMN HEapeEp “ Minvs,” It Is ro BE UNDERSTOOD 
Tuat No Minus or Zero Scores ARE POSSIBLE FOR THOSE 
PARTICULAR ITEMS 

Maximum VALUE 


Part IreM Pius MINnvs 
I pO Oe ee ee 1 
I ND Tn Te i a Ga im 0.8 = 4 0 6:00 cRWela aes 1 
I eka taco 0 bas ee es 1 
I TE: =. x 5-0 eae vee doh a 4 es SEs Cp eee eee 2 
II , ES i ae Pa Se re eee «4 EA 2 
II MO a, SAL, ddd dbckd. IRE dbo kos 2 —1 
II BE Wa Ihde Abd 0 ob 60000 s aie db 3 —2 
II Te a mks regen ye eee 3 —3 
II aid wk hgh Sbii goed Sa Sie se ny ele 4 —2 
Ill PEs cc heussecicedsenaesanas 416 —3 
II ME CP ere W a es et 6 ck tbe MEER ot ear 5 


A FoRMULA FOR THE TREATMENT OF THE VARIOUS TYPES OF SCORES 


From the preceding discussion it will be seen that any method of 
scoring the Thorndike Intelligence Examination must take into 
account the following elements: (1) The total number of items possi- 
ble on the tests; (2) the actual number of items attempted by the 
subject; (3) the number of plus scores made; (4) the number of minus 
scores; and (5) the number of zero scores. In the ordinary method 
of scoring the examination, the zero scores are treated in the same way 
as the items not tried, 7.e., they are simply disregarded. The total of 
the minus scores is then deducted from the total of the plus scores, 
after the total value of each has been determined, and the remainder 
is assigned to the subject as his ‘‘net”’ or final score. This is merely 
the application of the formula referred to above—the score is equal 
to the right minus the wrong (R—W). 

But for research purposes and for determining more accurately 
the rank of each individual taking the test, a more exact method 
may be desirable in many cases. For this purpose the following 


, EE oo ee Aes, 


4 o ROSIE 


o~ OF.“ ee ee ~~ e 














el 
> 
4 


~ eee Baile ae RE tS =e. 


ae 





452 The Journal of Educational Psychology 


formula is suggested, though it is questionable whether for practical 
purposes it gives results which vary widely from those obtained by 
the present “‘rule of thumb” method of scoring. 

Let p represent the total number of right scores where no minus 

(and consequently no zero) scores are possible; 
also let x represent the total number of plus scores in all tests where 
minus (and also zero) scores are possible; 
and let y represent the total number of minus scores in the last- 
mentioned tests; 
and let z represent the total number of zero scores in the same tests. 
Now the total maximum score in all the tests where no minus or zero 
scores are given is 356; and in all the tests where minus scores are 
possible the total maximum plus score is 436, while the total maximum 
minus score (penalty) is only —338, leaving a difference of 98 in favor 
of the plus scores. This difference is due to the fact, above referred 
to, that Professor Thorndike has proceeded on the principle of giving 
more credit for doing a thing correctly than penalizing for doing the 
same thing incorrectly. In any formula for scoring the tests, there- 
fore, this difference of 98 must be taken into account. 

Now in every case where the subject scores zero, theoretically he 
shows ipso facto that he does not know the correct answer to the 
problem. Accordingly, it would probably be a more exact treatment 
of the zero scores if we should make some small deduction for each, 
rather than disregard them altogether. It would seem fair, therefore, 
to equate the zero scores with the difference between the maximum 
total values of the plus and of the minus scores, giving this difference 
of 98 a minus or ‘‘ penalty” significance. Obviously the difference can- 
not be accounted for in terms of the untried items, since there is no 
way of determining what the subject might have done 7f he had 
tried them. } 

Returning now to the discussion of the formula we have: 


Total number of items in tests where no minus scores 


EY o's Sev Ube OU WA ON Sema et MVE HITS 302 

Total number of items in tests where minus scores are 
NN e's SSS SER, FORGE SRA PPO Ge 136 
Total number of items in entire examination........... 438 

Also: Total maximum score obtainable in tests where minus 
I. cl cect acc ccs es ts beers pete 436 

Total maximum score obtainable in tests where no 
minus scores are possible.................200cceee 356 





He 


Fr 








Predictive Value of Scores in Tests 453 


Hence: Total maximum score obtainable in the entire examina- 


Ais yw ed Soe seh Sw Sioa Saeki ss. fe edtle. os 792 
Total maximum minus score obtainable.............. — 338 
Total score if all tests where minus scores are possible are 
ee ree, EERE oaks idy wd Sie 9.8 — 98 
From the above we have: 
356 


302 = 1.179 (average value of each correct item in tests 


where no minus scores are possible; cf. p above); 


4 ' ‘ 
8 = 3. 206 (average value of each plus item in tests where 


minus scores are possible; cf. x above); 
-338 


736 = —2.485 (average value of each minus item in tests 
where minus scores are possible; cf. y above); 
ae = —.721 (average value of each zero item in tests where 


minus scores are possible; cf. z above). 
Employing the above values we have an equation for determining 
the intelligence rating for any individual as follows: 


Rating = 1.179p + 3.2062 — (2.485y + .721z) 


For purposes of comparison it was found best to reduce all ratings to 
a basis comparable to the coefficient of correlation, r, in which the 
maximum rating should be 1.000. This result may be easily accom- 
plished by the use of the following: 


_ 1.179p + 3.2062 — (2.485y + .7212) 
i 792 oe 


The rating thus obtained may be called the “intelligence index’”’ as 
distinguished from the “intelligence score.” 

Even a casual inspection of the formula will show that no index 
can be obtained greater than 1 or less than —1. For, if the subject 
should make a maximum plus score, with no minus and no zero items, 
the formula would reduce to 


356 + 436 — (0 + 0) _ 792 _, 
792 ™ 792 ~ 


On the other hand, if the subject made all minus scores, with no plus 
and no zero scores, we would have the following: 


0+0— (338+0) —338 


792 792 





Rating 








= —.427 





a Or eee © ae? ~S tll T= 


aa ite Tint i, ~~ <e 


te af TERS Oe eres 
- ee 


PT 


a RE 


454 The Journal of Educational Psychology 


Or, if only the items where minus scores are possible are considered, 
we would obtain 
0+0—-— (338+0) —338 _ 
436 Tae Mie 


Similarly, if all the items in the entire examination were scored zero, 
the formula would reduce to the following: 


0+0—0+98) -—98 _ 
792 “ee 7 
Or again, if we consider only the items where minus (and zero) scores 
are possible we have: 


0+0-(0+98) 98 _ 
436 = Be 7 7-2 


By way of summary we may indicate the results from the use of 
the formula as follows: 

If all items are correct the index is 1.000 

If all items are minus, the index is —.427 (or —.775) 

If all items are zero, the index is —.124 (or —.225) 


We may now arrange the various kinds of items in order of value 
as follows: (1) Plus, (2) omitted, (3) zero, and (4) minus. It is con- 
ceivable, however, that in tests of certain types omitted items should 
be penalized, perhaps more heavily than either zero or minus items, 
since the subject should in many instances receive some credit for 
attempting an item. This is on the theory that he supposedly knows 
something about the facts involved in the question though he did 
not give the correct answer. Under such conditions we might have 
the relative order of the various kinds of items changed to: (1) Plus, 
(2) zero, (3) minus, and (4) omitted. 

Again it may be pointed out that there is some truth in the criti- 
cism of Dr. Otis mentioned above. For, instead of assigning the 
values which Professor Thorndike gives, we might score a minus item 
zero and change the values of the other items accordingly. A compari- 
son of the present and the suggested methods of evaluation is given 
below. 











ITEM PRESENT SuGGgsTeD 
a a a bias 5 2'a u ae 6s oie Raa eeae 3.2 5.6 
te ae Del theo ke waeee se ena ea 0 2.4 
RN he eV t's MOREA ON obs oh bEb oe eK OS — .7 7 


’ & 
CL oc Et hae i oa Oe ala Oe —2.4 0 








i) 


ms Hoe Oats ese 


ee ee 





\e 





Predictive Value of Scores in Tests 455 


Further discussion of the various possible methods of scoring would 
carry us too far afield for our present purposes.' 

One further comment regarding the formula may be made. In- 
stead of assigning a value to p, zx, y, and z based on the arithmetic 
average, we may obtain slightly different but more accurate results 
by weighting the averages according to the proportion of scores to 
which each value included in the average is assigned (method of 
weighted averages). Also by careful manipulation, the average 
values of p, x, y, and z might be reduced to unity and the fractions 
eliminated, their relative values still remaining the same. Thus we 
might assign the following approximate values: p = 3, x = 9, y = 7, 
z = 2. In any case, these unit values could be made accurate by the 
proper handling of the items in the examination when it is devised. 

It should be repeated that the chief use of the formula may be 
found in cases where it is necessary to rank the individuals in a given 
group to whom the examination is given in the order of their per- 
formance, either for purposes of comparison within the group itself, 
or, where the group is not too large, for correlating with the criterion 
by the “‘rank-difference’”’ method of correlation. 


METHODS AND RESULTS OF THE STUDY 


In the present study the aim has been to make an investigation of 
the value of the various kinds of data derived from the Thorndike 
Intelligence Examination with especial reference to the relation of 
plus, minus, and zero scores to the probable scholastic performance of 
the student. Wood,? in the investigation already referred to, has 
found correlations as high as .672 between intelligence examination 
scores and the scholastic performance of students in Columbia College. 
It is reasonable, therefore, to suppose that the number of plus, minus, 
and zero scores made by any individual taking the intelligence exam- 
ination should have some predictive value in determining that indi- 
vidual’s scholastic performance. But the question was one which 
required actual investigation before a definite answer could be given 
to it. 

In order to obtain the fullest possible scholastic record for each 
student investigated, a group of 33 individuals was selected from the 
class entering Columbia College in September, 1919. This was the 


1 In order to make the intelligence index comparable to the intelligence score, 
the former should be in every case multiplied by .2, which is the procedure followed 
in transmuting the intelligence score to a percentage basis. 

2 Wood, B. D., op. cit. 








a) 


456 The Journal of Educational Psychology 


first class which took. the intelligénce examination for satisfying the 
entrance requirements, and though some individuals preferred to enter 
under the ‘‘old system” of taking college entrance examinations in 
specified subjects, all candidates for admission were nevertheless 
required to take the intelligence examination as well. This procedure 
was followed for the purpose of giving the Admissions Office addi- 
tional information regarding the candidate in question. As a result, 
since September, 1919, a candidate’s intelligence score has been one 
of the determining factors in his case, there being three other important 
factors, namely, scholastic record in high school, personal character, 
and (in a limited number of cases) the age at which the candidate began 
using English as a domestic language.’ 

The 33 cases which were selected for study all fell within the so- 
called ‘‘border-line”’ group, and their intelligence scores as taken from 
the official records ranged from 70.0 to 75.9, a class interval of 6 
instead of 5 being selected in order to provide a larger group for study. 
Any candidate for admission who scores below 70.0 is likely to be 
rejected unless he seems to be especially deserving from the standpoint 
of his other credentials; hence, those scoring only a few points above 
or below this “critical score’? may be considered as doubtful or 
“‘border-line’”’ cases. The group may be regarded as fairly homo- 
geneous, and the scores made by the individuals included in the 
group are quite comparable with each other, since the candidates 
not only took the same examination under the same conditions, but 
the forms of the examination papers were in every case the same with 
only one or two exceptions. 

In determining the scholastic performance of the group, the 
attempt was made to obtain the complete record of each person for 
the years 1919-20, 1920-21, and 1921-22. Where this was impossible, 
due to the fact that some did not remain in college throughout the 
full three years but dropped out either temporarily or permanently, 
only full semester records were taken into account, and all partially 
complete semesters were discarded. 

For purposes of comparison with the other factors concerned, the 
letter grades as noted on the individual’s scholastic record had to be 
transmuted into numerical equivalents. The following table, which 





1Compare these with the factors mentioned by Burt as entering into the 
“educational attainment” of English school children: (1) Chronological age, 
(2) school performance, (3) intelligence as measured by reasoning ability, and 
(4) mental age. Cf. Burt, C.: ‘Mental and Scholastic Tests,” London, 1921, 
p. 187. 





wMaywy vw vwstoeE 


* =H 


—- tla O* 





Predictive Value of Scores in Tests 457 


also takes account of all other notations employed by the Registrar’s 
Office, and which is based in the main upon a similar table employed 
by Wood,'! was used for this purpose. 


TaBLE III.—SHowine THE NuMERICAL EQUIVALENTS OF LETTER GRADES AND 
OTHER Norations Usep To INDICATE THE STUDENT’S SCHOLASTIC PERFORM- 
ANCE AS ENTERED IN THE RECORDS OF THE REGISTRAR’S OFFICE, 
CoLtumB1A UNIVERSITY 


GRADE EQUIVALENT GRADE EQUIVALENT 

A+ 15 C- 7 

A 14 D+ 6 

A- 13 D 5 

B+ 12 D- 4 

B 1l F+ 2 

B- 10 F 1 

C+ 9 F-— 0 

C 8 

OtTHerR NOTATIONS EQUIVALENT 

Abs. (absent from examination) - Half value at average Grade 
Dro. (dropped from class) 1 
H (attendance credit only) Half value at average Grade 
Inc. (incomplete) 1 


N (no credit allowed for more than one Full value with Grade D 
course with Grade D) 
NC (no credit by reason of irregular Half value of course at grade 


attendance) given 
NM (no mark on record) 1 
Wd. (withdrawn from course) 0 


X (credit conditioned upon satisfactory Full value at grade assigned 
completion of second half year of 


course) 
# (course credited for entrance only) Full value at average grade 
* (credit reduced for excessive absence) Full value at grade assigned 


** (credit allowed with warning as to Full value at grade assigned 
excessive absences) 
*** (credit withheld pending receipt of Half value at average grade 
excuse for absences) 


Using the numerical equivalents given in the above table the 
method of computing the scholastic index of an individual is as follows: 





1 Wood, B. D., op. cit. The table as here given has been modified from that 
originally used by Dr. Wood and conforms more to the revised table which has 
been used by him since 1922. It may be said that the values assigned to the 
various letter grades were empirically determined and have been found satisfacto 
in actual practice. 


ee 


i 6 i ee ee 








458 The Journal of Educational Psychology 


For each course taken he is credited with the numerical equivalent 
of the grade obtained in the course multiplied by the number of 
points at which the course is valued in the current announcement of 
the University. The total for any one semester is divided by the total 
number of points carried during the semester, in order to get the 
semester average. The total of the semester averages divided by the 
number of semesters during which the individual was in residence 
gives the “‘general average” for his entire scholastic performance. 
This average is determined, however, on a basis of 15 (see Table III). 
Therefore, it may be reduced to a decimal basis, in order to conform 
to that of the “intelligence index’”’ and the ‘“‘intelligence score,’”’ by 
multiplying it by 624. This procedure was followed in every case, 
and the result may be regarded as the “‘scholastic index” for each 
person in the group. 

On the basis of the data accumulated as indicated above the fol- 
lowing correlations were obtained, all of which have been computed 
by the rank-difference method and the use of the Spearman formula. 


TaBLeE IV.—SumMMARY OF CORRELATIONS 


Correlation between 


1. Total number of plus scores and scholastic index.................... .467 
2. Total number of minus scores and scholastic index.................. .334 
3. Total number of zero scores and scholastic index.................... . 303 
4. Total number of minus and zero scores combined and scholastic index. . .271 
5. Total value of plus scores and scholastic index...................... — .457 
6. Total value of minus scores and scholastic index.................... . 230 
7. Intelligence index and scholastic index............................. — .271 
8. Intelligence score and scholastic index........................22--. — .126 


THE SIGNIFICANCE OF NEGATIVE SCORES 


Although no general statement can be made regarding any of the 
results of the preliminary study because of the small number of cases 
involved, nevertheless it will be seen from an inspection of the table 
of correlations above that the significance of the minus scores is 
decidedly marked. In fact, the minus scores are the only ones which 
give a plus correlation with the scholastic performance as regards 
both their number and their value. The number of negative scores 
has a correlation of .344 with the scholastic performance and the value 
of these same scores shows a correlation of .230 with scholastic per- 
formance. It was thought worth while, therefore, to carry this phase 
of the matter further by treating the negative scores from a slightly 
different point of view. 








ind 
foll 


ite! 
the 
onl 
thr 
so 
hig 
the 


or 
the 
ust 
ex¢ 
tin 
us! 





ee eS ae Oe ee 


\w 


“ve Wwe We al we \e We 


os \w ' 





Predictive Value of Scores in Tests 459 


An inspection of the intelligence examination papers of the 33 
individuals showed that most of the minus scores occurred in the 
following tests: 

Part II, Test 7 (Mechanical information) 

Test 8 (True-false—hard) 

Part III, Test 2 (Sentence completion) 


The mechanical information test is made up of rather difficult 
items, which demand a certain degree of technical knowledge before 
they can be answered correctly. Hence, the average person with 
only four minutes to answer 10 questions prefers to guess at two or 
three of them instead of leaving them all blank. The questions are 
so phrased that a guess usually turns out to be wrong, because of the 
highly specialized information necessary for correct answers; hence 
the frequency of the minus scores here. 

In Part II, Test 8, there is a list of 60 statements to be marked T 
or F accordingly as they may be true or false. The number is so large 
that the subject is almost lured on to take chances, though he is 
usually cautious enough not to allow the number of his guesses to 
exceed the number of answers of which he is reasonably sure. The 
time allowed for this test is 13 minutes, and consequently there is 
usually ample opportunity for taking chances. 

The sentence completions of Part III, Test 2, are all quite difficult. 
The sentences have no immediate context from which the subject 
may judge of the correctness of his answers, and in many instances 
several words may be supplied in the same blank space and the 
sentence will still apparently make sense. Although 20 minutes are 
allowed for this test, many items are likely to be wrong because of 
the subject’s lack of any criterion of correctness except his own infor- 
mation. 

It may be said, therefore, that these three tests seem to have 
special significance as far as minus scores are concerned, though the 
extent of this value is yet to be determined. 

Outside of the plus scores which, of course, form the most important 
part of an individual’s final score, the correlations obtained would 
seem to indicate that the minus and the zero scores offer the most 
promising field for investigation in a more extended study of per- 
formance on the tests. Whether or not these types of scores possess 
any real importance will be the subject to be dealt with in the suc- 
ceeding chapter. 


— = -_ - ens. ie 


460 The Journal of Educational Psychology 


As an illustration of the difference in scholarship between indi- 
viduals showing a large number of minus scores and those showing 
a small number the following may be mentioned: 

The entire group of 33 individuals was divided into two smaller 
groups of 16 and 17 respectively on the basis of the actual number of 
minus scores made by each person. For Group I the range of minus 
scores was from 4 to 23, with an average of 15.6; for Group II the 
range was from 23 to 40 with an average of 31.7, or more than twice 
as many as the average for Group I. Between the average intelli- 
gence scores of the two groups there was a difference of only 1.2, and 
for all practical purposes this difference is a negligible one. The two 
groups may, therefore, be considered as having approximately the 
same intelligence. The average intelligence index for Group I was 
64.8, with an average scholastic index of 55.6 (nearly D+ on the 
letter scale), while for Group II the average intelligence index was 2.8 
points higher, or 67.6, but for this group the average scholastic index 
was only 47.0 (nearly D— on the letter scale), 7.e., 8.6 points lower 
than that of Group I. If similar results were obtained for a con- 
siderable number of cases, it would seem to show that the number of 
minus scores made by an individual is of considerable value in pre- 
dicting his probable scholastic performance. In other words, the 
larger the number of minus scores, the lower the scholastic index of 
the individual is likely to be (other things being equal). 

The results of the comparison indicated above may be summarized 
in Table V. 


TaBLE V.—SHOWING THE COMPARISON BETWEEN Two Groups ARRANGED ON THE 
Basis OF THE NuMBER OF Minus Scores MADE BY EACH IN THE THORNDIKE 
INTELLIGENCE EXAMINATION 


Grovp I Grovp II 
Number of individuals.................... 16 17 
Range of minus scores.................... 4-22 23-40 
Average number of minus scores............ 15.6 31.7 
Average intelligence score................. 74.4 73.2 
Average intelligence index................. 64.8 67.6 
Average scholastic index................... 55.6(=D-+) 47.0(=D-—-)! 


1In this part of the study the figures are given on a basis of 100 instead of 
1 as previously. This is for the purpose of making the differences between the two 
groups more apparent than they otherwise would be. 


Summary.—The results of the study would seem to indicate that 
both the number and the value of the minus scores, as well as the 








nur 
in ' 
pos 
gro 
as 

cha 
vid 
gen 
stri 
int 
the 
pla 
sin 
eve 
cc Ci 
ant 
tin 
alo 





i- 
1g 


er 
of 
us 
ne 


li- 
id 
vO 


he 


he 


CX 
er 


of 
‘e- 
he 
of 


ed 


wo 


at 
he 





Predictive Value of Scores in Tests 461 


number of zero scores, possess more or less significance when taken 
in their relation to intelligence test performance, and it is entirely 
possible that they might show even greater importance in a larger 
group of individuals. If the minus and zero scores combined be taken 
as an index of the ‘willingness’ or ‘‘unwillingness’”’ to ‘‘take a 
chance,” or of “‘caution” and ‘“‘rashness” in the case of the indi- 
viduals tested, it is quite obvious that what one does on an “‘intelli- 
gence examination”’ (so-called) is the product of something more than 
strictly intellectual factors. There must be character traits called 
into play as well, and these leave behind them more or less evidence of 
their presence. A similar remark may be made regarding the part 
played by intelligence in tests devised primarily for character traits, 
since the former is necessary for the satisfactory performance of nearly 
every test of whatever nature. 

In the writer’s opinion, the time is not far distant when a single 
“‘composite”’ test will be employed to test an individual’s intelligence 
and the most important phases of his character at one and the same 
time. Ina subsequent study it is hoped to discuss further possibilities 
along this line. 


1 
= 5 
i 
| 
. 
| 
| 


Re ll CO ind 





THE STANDARD ERROR OF CERTAIN ESTIMATED 
COEFFICIENTS OF CORRELATION 


EUGENE SHEN 


Stanford University 


When the reliability coefficient of a test is determined from the 
correlation between its two halves by means of the Spearman-Brown 
Formula, instead of directly correlating an entire form with another, 
certain assumptions are involved and its standard error is greater 
than that given by the ordinary formula for the standard error of a 
correlation coefficient. A specific formula ought to be available and is 
here derived. 








2r1 1 2dr; r 
se 2 EA «ee 
ru = 1 $n ; dry (: Tix 2 

2u 2 

2071 I 
orm 2u 
2u 
1— r71 1 


Since r; ; is directly calculated from original data, o,, , = ——~2”- 
2 i 2 ; /N 
Substituting this in (1), we have 


x ae 





mu VN(i 2 +) ans 
2 1 
Solving the Spearman-Brown Formula for 7; , andsubstituting in (1a), 
2 
we get 
2(1 an rit) 





orn = JN (1b) 


It can be seen, especially from (1b), that unless r,, is perfect, its stan- 


1—?r 2(1-—r 2 
dard error is always greater than | a for A = < "it? 3 





For a numerical example, suppose 7; , = .40, then 
2u 


8 2(1 — 5714) a 
Ty = 14 = .5714 or = /N a 8572/4/N 


462 











Error of Estimated Coefficients of Correlation 463 


1 — r? 


If we had applied the usual formula ¢, = Ti » we would have 





obtained .6745/+/N, whichis much toosmall. Therefore, wherever the 
reliability of a test is estimated by the Spearman-Brown Formula, its 
standard error should always be calculated by one of the three specific 
formulas here given, and formula (16) is especially recommended for 
its simplicity. 

Next, let us consider the more general situation where the corre- 
lation between the average score on a forms of a test and a other forms 
is estimated from the correlation between any two single forms. 
Then, by a similar process, 


i a ia 2 ai (2) 
eras (1+ (@—Irul? YN + (@— Dru? 
where 
ari 


‘et Tee iru 





Formula (2) is the general formula from which formula (1) may be 
derived, and the general relations discussed in connection with the 
latter also hold here. 

When the correlation coefficient between two tests or two functions 
is ‘‘corrected” for attenuation, it is also a case of estimation and 
involves certain assumptions. The ‘‘corrected”’ coefficient is a measure 
of the maximum possible correlation and has a standard error greater 
than that of a “‘raw’”’ correlation coefficient. If we call two forms of 
one test or two tests of one function Test 1 and Test 3, and call two 
forms of another test or two tests of another function Test 2 and Test 
4, then ri; is the reliability of the first test or the reliability with which 
the first function is tested, res is a similar measure for the second test 
or the second function, 712, 714, 723, and 734 are cross correlations between 
the two tests as represented by the different forms or between the 
two functions as represented by the different tests, and r.,, is the 
“corrected” correlation or the estimated true correlation between the 
two tests or two functions in case they were perfectly reliable or have 
perfectly reliable measures. There are three different methods of 
calculating the “‘corrected”’ coefficient; besides ris; and re, the two 
reliability measures, the cross correlation is either (a) represented by 


any one of the four cofficients, or (b) by their geometric mean, or (c) by 
their arithmetic mean. 


) She. -*. 


“ eee BEE Re 


- - Sate) 


te ee 








464 The Journal of Educational Psychology 


Tie T14 T23 - T34 


Vravru VrisVres ~ Vista Y VrisV re 


(b) r (rie? 14723734)? 
ee eee 
nadie 
r r r r 
(¢) toy = —=—_—— ae r= ria + ia + 723 + 1s 


Fy er 4 


The third form is iid useful when the cross correlation is cal- 
culated by the sum or average of Test 1 and Test 3 with that of Test 2 
and Test 4. Then the four individual cross correlations are not known, 
but their arithmetic mean can be calculated by formula (1616) given 
in Dr. Kelley’s ‘‘Statistical Method.” 


— Re 1+ Tre 
= 1(143)(2 +44) 9 — 


The standard error of the coefficient of correlation corrected for 
attenuation takes different values according to the way the corrected 
coefficient is calculated. Dr. Kelley has given the standard error for 
the first two cases. A similar formula for the third case is here derived. 
Taking logarithmic differentials, 

OT wes ah! dr aris _ dros a dri2 + drsq + dri4 + dros i dri3 _ dre, 
foo *£ Bis Ba Ar a 
After squaring, summing, dividing by N, substituting by formulas 
(108b) (128), and (129) in Kelley’s text, collecting terms, and simplify- 

ing, we finally get the following: 











(a) Tow = 

































































ta val + TiaP23) 1 re l+ratts 4 #4 
24/N Tisl24 1? cou as Tis 24 
ee Sey gis. Ae we (Tiefs4 — T1423) (r?14 +t" o = 15 + 7? 34) 
a os 4r? 
3[ (ric — 134)? + (T14 — 123)? 
8r? 
(7719-1734 — 77 14— 1793) (Tia +134—T14—T23) 
+ 4r 
7 13(Tiefe3s + Press tT aos tT ial sa) 2 ris(TiotTi4— To3— 134)” 
e Ar? Bt 4r? 
4 Toes tProt etal sa tT sar 14 Pe 2(risres + r14T'sa) 
TT 13 Y13 
4 7? o4(M 12% 4+ 12% 3a +7 ales +Pesr a4) —" Taa(Tie + ¥23—T14—T 34)” 
4r? a 4r? 
4 PoP atrial ot 1 723% sa +1 34723 ie 2(rieh 14 > aia (3) 
TT 24 Y24 





= 


~ =~ we ot (85lUCUTDO lt OU SlUCUMNClCUDClCOCS STClCOF 





Error of Estimated Coefficients of Correlation 465 


This is the proper formula to use when the arithmetic mean of the 
four cross correlation coefficients is used in calculating the corrected 
correlation. However, formula (161d) in Dr. Kelley’s text, which was 
primarily derived for the case where the geometric mean was used in 
the correction, will serve as a remarkably close approximation. The 
exact error introduced by such an approximation is difficult of a general 
statement and is here shown only by a special example. If rie = .40, 
Tiu4 = .50, T23 = 00, 34 = .60, and T13 = T24 = .50, then r = 00, Tow = 
1.00 and the exact standard error is +/.7238/N while its approximation 
is ~/.7500/N, the difference being less than2 per cent. For ordinary 
situations, therefore, no material error will be introduced by the 
approximation, and when the mean cross correlation is calculated 
without a knowledge of the four individual coefficients only the shorter 
formula is applicable. This long formula is only recommended where 
utmost accuracy is desired and especially where rie, ris, 723, and 134 
differ from their mean to a considerable extent. 

The standard-error formulas here given were all derived at the 
suggestion of my teacher, Dr. T. L. Kelley, to whom acknowledgment 
is also due for checking the results. 








ae A ee er 


+ ety RO RS "a: 


— Sh Bd bit A te > 


REQUEST FOR INFORMATION IN A STUDY OF THE 
EFFECT OFENVIRONMENT ON INTELLIGENCE 


The evidence concerning the relative importance of heredity and 
environment has commonly been of an indirect sort and has been 
somewhat difficult to interpret. We are much in need of a crucial 
experiment in the matter, or at least of a type of evidence which will 
resemble that which might be gained by means of a crucial experiment. 
Such an experiment would enable us to isolate and measure separately 
the effect of each of the two factors. Something corresponding to 
an experimental isolation of the two factors occurs in the case of those 
children who are adopted in infancy and are placed in an environment 
which is radically different from that in which they would have lived 
if they had remained in their own family. In this case the effect of 
environment may be measured by comparing the intelligence test 
scores of children adopted in infancy with the scores made by their 
brothers and sisters by blood. We may measure the effect of heredity, 
on the other hand, by comparing the scores of these adopted children 
with those of the children in the family into which they were adopted, 
in case there are such children. 

I propose to gather statistics on this matter and to make tests of 
all available cases of the sort which have been mentioned. Will 
the readers of this note communicate to me the names of persons who 
have adopted children in infancy and who might be willing to cooperate 
in this study? 

Frank N. FREEMAN. 


466 











H 


QoQ w@© ve ve 


i—_ -« aA oo @® © 





SS we lO OTS SS Sl Serelt( LhSelUOlUltC“ OC 


Vv SY — Ph 








NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


Svan EDUCATION ~~ 


CONDUCTED BY LAURA ZIRBES 











Dr. Bossitr Teitts “How’”’ 


How to Make a Curriculum, by Franklin Bobbitt. The Riverside 
Press, Cambridge, Mass., 1924. Pp., 292. 


The author’s purpose in this volume is to give curriculum 
workers over the country a feasible program or technique of curriculum 
construction and enough illustrative types of procedure to enable 
them to carry out their work according to his philosophy. As subsid- 
iary purposes he sets about to show the tentative nature of his own 
statements, the necessity for making all changes gradually, the signifi- 
cance and place of scientific or quantitative studies, the necessity 
for realizing that general and vocational objective are clearly disparate, 
and the long range possibilities of coordinated curriculum revision 
in widely diverse and scattered centers as a means of arriving at a 
common core of general training. Objectives are by the following 
steps set up as follows. 

1. As a premise, a general aim of education is assumed. 

2. In the light of that aim or premise the main lines of adult 
activity which make for well rounded adjusted adult life are selected. 

3. After setting up nine general headings and one vocational cate- 
gory, activities are listed under each classification, these having been 
gathered partly by direct observation of life situations of adults. The 
author is willing to utilize the unanimous judgment of expert analysts 
or observers without direct observation whenever possible. 

4. A further assumption is made, namely, that the abilities which 
enable people to carry on these listed activities are the objectives of 
education. Each ability is recognized as a complex unit and is 
analyzed into specific component abilities and list activities which 
will provide practice and lead to its development. The author agrees 
that these more minute objectives and their limitations should legiti- 
mately be determined by careful quantitative bases of selection and 





1 Unsigned reviews were prepared by L. Z. 
467 





6 SR EER Ms * 


468 The Journal of Educational Psychology 


rejection when other means fail to secure general agreement. Those 
who are interested in the improvement of instruction by any means 
whatever should read this volume, to see how the author proposes to 
reach that objective. Those who are particularly interested in 
changing the present curriculum of a school or a community will find 
that Bobbitt and the 2700 workers who have been associated with him 
in curriculum work have much to contribute from their experience. 
Investigators and others who are concerned with a critical study of 
the theory and principles of curriculum construction will wish to 
compare other philosophies of values, aims, and objectives with Bob- 
bitt’s and will wish to question his basic assumptions and the premise 
upon which the whole structure of his curriculum technique rests. 
Some will immediately question the validity of the method of gather- 
ing and selecting activities. Others will begin disagreeing with the 
emphasis given adult life values, and the under emphasis on that 
third or fourth of life which precedes maturity and has values in 
itself. Some will wonder why vocational education is so definitely 
postponed and restricted by an author who begins with adult values. 
The analysis does break up the adult values into stages of growth and 
corrects some of the dangers of emphasis implied in the first few pages 
of the volume. 

Because of the numerous and urgent requests for cooperation 
and advice on curriculum revision workers are no doubt justified in 
setting down their technique somewhat didactically and thus saving 
their energies but spreading and coordinating their efforts. This 
should not hamper, but encourage the critical examination of such 
volumes by those who qualified to compare the question basic assump- 
tions and techniques. 





How Dors CoacHING INFLUENCE MENTAL TEstT Scores? 


Influence of Training on Intelligence Tests, by Katharine B. Graves. 
New York: Teachers College Series, Columbia University Contribu- 
tions to Education. No. 143. New York, 1924. Pp. III + 78. 
Bvery psychological examiner who has been confronted with the 

necessity for rating a ‘‘coached”’ subject will be interested in this 

problem. After a preliminary experiment with 12 children, plans 
were organized for work with 153 cases divided into four groups as 
follows. A control group, two coached groups, one of which had 
slightly more training and a greater interval between tests than the 








tir 


e\ 





— ww BS wv 


~- — 


ae ' JF me © 


6 i ee i ee a en oe ie 


SS 


@ Ww ee 


ey = YW Wh 





New Publications 469 


other, and one group which received training with elements deemed 
comparable and similar to those in the tests but no coaching on the 
test elements themselves. 

Allowance for growth in the interval between tests was made on 
the basis of elapsed time. Reasons for this decision and other details 
of organization and procedure are fully reported. The elements used 
with the group which received “similar” training are placed in an 
appendix and the claims for equivalence or similarity to the respective 
elements of the original test may thus be examined. The amount 
and precise nature of coaching is described for each element. 

Several of the 14 conclusions are quoted because of their implica- 
tions for further research. 

“Direct coaching on the material of the tests is extremely effective, 
even when the time given to the coaching is small. 

“The effect of such direct coaching persists to a large degree for 
a period of three or four months, at least. 

“Indirect coaching, or training in work similar to the material of 
the tests, is also effective, though to a much smaller degree. This 
effect is serious enough to be considered. 

“The effect of such indirect coaching persists to a large degree for 
a period of three or four months, at least. 

“Several alternative forms of the individual examination series 
are needed, in order to measure intelligence with as little disturbance 
as possible from coaching, conscious or unconscious. 

“An effort should be made to find tests which are noncoachable. 
Since many tests of this type do not differentiate well, it would be 
advisable to give a practice period on material used in intelligence 
tests, and make comparisons between children on the basis of subse- 
quent testing. 

“The control group gains considerably from the repetition of the 
test. Continued testing gives a child an undue advantage over a 
child who has never had the test before. 

“None of these results gives us material for deciding the nature of 
intelligence. They do point out some dangers from which we must 
safeguard future tests of intelligence. 

“Since the effect of closely similar training is so slight after the 
lapse of a year, it seems probable that the differences in the educative 
ability of various school systems have little differentiating effect on 
the standing of their pupils in intelligence examinations.” 


oh Seebre. sag Tak cae alee 


ee ee a a - 


470 The Journal of Educational Psychology 


An ANALYTICAL Stupy OF INTELLIGENCE TEST ELEMENTS 


A Study of Intelligence Test Elements, by Leona Vincent. Teachers 
College Series of Columbia University Contributions to Educa- 
tion. No. 152. New York, 1924. 


This worker contends that the next step in the improvement of 
intelligence test should consist in the analysis of test elements into 
the smaller unit reactions which go to make up the various test reac- 
tions. Because there was no available method of analysis, the inves- 
tigation and study of method was a necessary first step. 

All terms and materials used are described in Chap. II. Chapter 
III reports the initial tryout of methods. The Pearson product- 
moment method was tried out on 56 elements but rejected. Pearson’s 
mean square contingency coefficient was considered. When examina- 
tion of the scoring keys showed very little difference between the 
qualities of answers marked +3 and +2, a considerable difference 
between answers marked +2, and +1, but very little difference 
between answers marked 1, 0, —1, and —2. The investigator was 
led to consider the element scores as “‘ passed”’ or “failed” putting the 
division point between 2 and 1. The criterion consisted of a many 
categoried variable and the element scores were considered as a 
dichotomous variable. To preserve the discriminating power made 
possible by the many categoried criterion it was necessary to reject 
Yule’s coefficient of association and Kelley’s tetrachoric r. 

The method of bi-serial correlation was being tried out when the 
method of overlapping (Thorndike) suggested itself. The two latter 
methods were compared on the basis of 35 computations, and it was 
decided to use the latter because it compared favorably with the 
more laborious method and thus made the study of a larger number of 
test elements possible in a given period. 

Completion elements, reading elements and arithmetic elements 
were subjected to analysis. This is a pioneer study, and is perhaps 
more valuable as an indication of a new approach to test construction 
and criticism than in any other way. 





A MEASUREMENT PRIMER 


Beginnings in Educational Measurement, by Edward A. Lincoln. 
Philadelphia: Lippincott, 1924. Pp. 151. 


This recent addition to Lippincott’s Educational Guides aims to 
provide an informational background in measurement for short 








c+ © 


I 


me A S 


~ * = © <2 


ane hlOttl,lUlU rel ll ele el PTHUlC(itC SOC 





mM 


ao Ow SS CO fF OO ODO OM 1 


+ © 





New Publications 471 


courses where the actual class time may be devoted largely to the 
examination and trial of actual tests and scales. It succinctly relates 
the history of the measurement movement, gives a minimum of 
mathematics necessary to understand common statistical terms, 
contrasts the use and misuse of tests, sketches the development of 
the measurement of intelligence, introducés the subject of character 
measurement, and reviews some of the more important principles for 
guiding teachers in the use of standard tests. It is exceptionally clear 
in its expositions, sane in its warnings and criticisms, and wisely 
discriminating in its omissions. It names few tests and describes 
none. It merely offers a list of research bureaus and publishers from 
whom these may be obtained. 

There is use for this kind of measurement book in extension courses 
and in short summer courses taken by teachers in service. It is also 
to be recommended for independent reading where only the ele- 
mentary essentials of this subject are desired. 

M. H. Wruuine. 


A “GUIDANCE” ScHOOL 


Junior High School Life, by Emma V. Thomas-Tindal and Jessie 
DuVal Myers. New York: Macmillan, 1924. Pp. XIX+287. 


This is a remarkably fine description of a junior high school whose 
aim from the start has been clearly conceived and persistently kept in 
view, and whose machinery in every detail has been designed to keep 
it moving toward that aim. The Holmes School of Philadelphia, of 
which the authors are, respectively, the principal and an instructor, 
appears to have achieved an integration of the sort most ardently 
hoped for by the leading promoters of the junior high school. In its 
organization and operation it has been dominated by the idea of 
“guidance,” interpreted broadly enough to include the physical, 
curricular, social, vocational, civic, avocational, and ethical phases of 
adolescent life. The authors emphasize particularly, and with con- 
siderable illustration, what this idea has meant in developing effective 
teacher counselorship, in bringing about a truly educative student 
participation in school government, in providing for “scholastic 
reénforcement,”’ in instituting ‘grade forums,” and, chief of all, in 
making actually intra-curricular a host of pupil interests and activities 
traditionally separated from the real business of schooling. The list 
of the school clubs is a long and diversified one, and even a very brief 
outlining of their aims and activities fills a chapter of 90 pages. The 





EL STS IER 





472 The Journal of Educational Psychology 


authors make one feel, however, that these clubs perform so great a 
service in carrying out the purpose of the school that they fully justify 
the great expenditure of time and energy which they must require. 

The book is a genuine contribution to the literature of the junior 
high school. It should be of immense interest and value to principals 
and teachers now struggling with junior high problems, because it 
shows so concretely how to translate principle into practice in what 
are doubtless the most puzzling phases of the life of this new school. 

M. H. WILLING. 





PsYCHOLOGY QUESTIONS 


Guide to Educational and General Psychology, by John P. Wynne. 
_ New York: Fordham Publishing Co., 1924. Pp. VI+84. 


This little book presents over fifteen hundred questions designed 
to steer beginning pupils through the elements of the subjects named. 
It contemplates the selective reading of several standard texts on each 
of the 30 topics covered by the questions, instead of the intensive 
assimilation of any one book. A list of carefully paged references 
follows each set of questions; and an excellent, up-to-date bibliography 
takes up the last four pages. ‘‘The topic headings and the general 
trend of the whole indicate a functional point of view.’’ Young or 
substitute instructors in psychology not yet equipped with their own 
compilations of this sort will find use for the book. 

M. H. Wri1ine. 














me ewe 


—— ee ee > ree 





[a 


Sa4n Oa + 


re vy 779°. 








